Causal inference to isolate cause-and-effect from other biasing factors
Data from observational studies and randomized trials present exciting opportunities to better understand risk factors, to build improved predictors, and to examine the causal relationships between variables. Still, there are many sources of association, including direct effects, indirect effects, measured confounding, unmeasured confounding, missing data, and selection bias. Methods to delineate causation from correlation are perhaps more pressing now than ever.
Super & Targeted Learning for Superior Prediction & Effect Estimation
Machine learning can improve risk prediction by relaxing the modeling assumptions made by standard approaches. A core strength of our research is the application of Super Learner, an ensemble method, to develop flexible prediction algorithms. Another strength of our research is the incorporation of machine learning to avoid unsubstantiated assumptions when estimating causal effects. We have expertise in the extension and application of targeted minimum loss-based estimation (TMLE), a general approach to semi-parametric efficient estimation that naturally integrates machine learning and formal statistical inference.
Linked Super Learner papers
Linked Targeted Learning papers
Inference with Missing & Dependent Data
In both observational settings and randomized trials, participant outcomes are often subject to missingness. When participants with missing outcomes differ meaningfully from those with measured outcomes, complete-case analyses yield highly biased conclusions. Further complications arise due to the dependence of outcomes between individuals within a cluster (e.g., community or clinic). Theoretically, in simulations, and with real data, we have demonstrated the importance of flexibly controlling for baseline and time-varying causes of missingness, while rigorously accounting for the dependence of observations within a cluster.
- SEARCH trial applications
Cluster Randomized Trials to Translate Research into Practice
In cluster randomized trials (CRTs), groups of individuals (e.g., communities or clinics) are randomly assigned to treatment arms. CRTs are often pragmatic studies in that the focus is on assessing comparative effectiveness in real-world settings. We tackle key questions in the design and analysis of these trials. In particular, we have demonstrated the gains in efficiency, power, and interpretation from pair-matching over complete randomization, targeting the sample effect instead of a population average parameter, and data-adaptive adjustment through a pre-specified analysis.