Data Science Intermediate

Addressing Causal Questions using Real World Data: an Introduction
15 - 17 May 2023

Ellie Iob, Andrea Aparicio Castro, Eduardo Fe, and Bianca De Stavola

This introductory course is for anyone wishing to understand how causal questions can be investigated using real world data (RWD), that is data on the everyday experiences of individuals that are collected through surveys, cohort studies, administrative and clinical databases or accrued for reasons other than research.  These data are observational, as opposed to experimental. Because of this, using them to address causal questions raises many concerns and difficulties. In this course we will describe the main sources of bias affecting RWD and possible strategies to deal with them.

The course will start by distinguishing between different types of studies (e.g., RCTs, cross-sectional and longitudinal) and data sources (e.g., research-based, administrative databases).  It will then describe the sources of bias that are likely to affect observational data, in particular  those arising from the non-randomized allocation of exposures (denoted confounding bias in epidemiology and selection bias in the social sciences), from missing participation (including missing data), and from measurement errors. We will then introduce two main design-based approaches to attempt dealing with (some of) these biases: the framework of target trial emulation and the exploitation of natural experiments.

To develop an understanding of the main challenges arising from using RWD for investigating causal questions and of the approaches to mitigate them.

The course will consist of three recorded lectures accompanied by live on-line tutorials and a course summary.  Participants are expected to have listened to the lectures before attending the relevant live practicals. All lectures will be available to be downloaded a week before the live component of the course. Participants are expected to have followed all RADIANCE appetizers.



From Monday 8th May

Lecture 1

Study and data types

Lecture 2

Potential biases

Lecture 3

Emulating target trials

Monday 15th May


Summary of Lecture 1 and Practical 1

Tuesday 16th May


Summary of Lecture 2 and Practical 2

Wednesday 17th May


Summary of Lecture 3 and Practical 3


Practical 4 and Course Overview

An understanding of basic statistical concepts (i.e. descriptive statistics mean standard deviation confidence intervals etc), quantitative data structures and types of variables.


This is a UKRI funded project offering rigorous training in longitudinal data science. Please note that this training is NOT available to undergraduate or masters students.