Data Science Intermediate

Addressing Causal Questions using Real World Data: an Introduction

Ellie Iob,  Eduardo Fe, and Bianca De Stavola

This introductory course is for anyone wishing to understand how causal questions can be investigated using real world data (RWD), that is data on the everyday experiences of individuals that are collected through surveys, cohort studies, administrative and clinical databases or accrued for reasons other than research.  These data are observational, as opposed to experimental. Because of this, using them to address causal questions raises many concerns and difficulties. In this course we will describe the main sources of bias affecting RWD and possible strategies to deal with them.

The course will start by distinguishing between different types of studies (e.g., RCTs, cross-sectional and longitudinal) and data sources (e.g., research-based, administrative databases).  It will then describe the sources of bias that are likely to affect observational data, in particular  those arising from the non-randomized allocation of exposures (denoted confounding bias in epidemiology and selection bias in the social sciences), from missing participation (including missing data), and from measurement errors. We will then introduce two main design-based approaches to attempt dealing with (some of) these biases: the framework of target trial emulation and the exploitation of natural experiments.

To develop an understanding of the main challenges arising from using RWD for investigating causal questions and of the approaches to mitigate them.

The course will consist of three pre-recorded lectures and four practical exercises to consolidate your learning (no statistical software required). 

An understanding of basic statistical concepts (i.e. descriptive statistics mean standard deviation confidence intervals etc), quantitative data structures and types of variables.