Causal inference to isolate cause-and-effect from other biasing factors

At increasing velocity, volume and variety, we are generating, recording, and storing unprecedented amounts of data. Big Data present exciting opportunities to better understand risk factors, to build improved predictors, and to examine the causal relationships between variables. Still, there are many sources of association between variables, including direct effects, indirect effects, measured confounding, unmeasured confounding, and selection bias. Methods to delineate causation from correlation are perhaps more pressing now than ever.

Super & Targeted Learning for Superior Prediction & Effect Estimation

Machine learning can improve risk prediction by relaxing the modeling assumptions made by standard approaches. A core strength of our research is the application of Super Learner, an ensemble method, to develop flexible prediction algorithms. Another strength of our research is the incorporation of machine learning to avoid unsubstantiated assumptions when estimating causal effects. We have expertise in the extension and application of targeted maximum likelihood estimation (TMLE), a general approach to semi-parametric efficient estimation that naturally integrates machine learning and formal statistical inference. 

Inference with Missing & Dependent Data

In both observational settings and randomized trials, participant outcomes are often subject to missingness. When participants with missing outcomes differ meaningfully from those with measured outcomes, complete-case analyses yield highly biased conclusions. This potential for bias is exacerbated when the exposure of interest influences outcome measurement. Further complications arise when participants are not independent. Both theoretically and with simulations, we have demonstrated the importance of flexibly controlling for baseline and time-varying causes of missingness, while rigorously accounting for the dependence of observations within a cluster (e.g. community).

Cluster Randomized Trials to Translate Research into Practice

In cluster randomized trials (CRTs), groups of individuals (e.g., communities or clinics) are randomly assigned to treatment arms. CRTs are often pragmatic studies in that the focus is on assessing comparative effectiveness in real-world settings. We tackle key questions in the design and analysis of these trials. In particular, we have demonstrated the gains in efficiency, power, and interpretation from pair-matching over complete randomization, targeting the sample effect instead of a population average parameter, and data-adaptive adjustment through a pre-specified analysis.