Causal inference to isolate causeandeffect from other biasing factors
At increasing velocity, volume and variety, we are generating, recording, and storing unprecedented amounts of data. Big Data present exciting opportunities to better understand risk factors, to build improved predictors, and to examine the causal relationships between variables. Still, there are many sources of association between variables, including direct effects, indirect effects, measured confounding, unmeasured confounding, and selection bias. Methods to delineate causation from correlation are perhaps more pressing now than ever.
Linked papers
Upcoming/recent workshops
Super & Targeted Learning for Superior Prediction & Effect Estimation
Machine learning can improve risk prediction by relaxing the modeling assumptions made by standard approaches. A core strength of our research is the application of Super Learner, an ensemble method, to develop flexible prediction algorithms. Another strength of our research is the incorporation of machine learning to avoid unsubstantiated assumptions when estimating causal effects. We have expertise in the extension and application of targeted maximum likelihood estimation (TMLE), a general approach to semiparametric efficient estimation that naturally integrates machine learning and formal statistical inference.
Linked Super Learner papers
Linked Targeted Learning papers
Inference with Missing & Dependent Data
In both observational settings and randomized trials, participant outcomes are often subject to missingness. When participants with missing outcomes differ meaningfully from those with measured outcomes, completecase analyses yield highly biased conclusions. This potential for bias is exacerbated when the exposure of interest influences outcome measurement. Further complications arise when participants are not independent. Both theoretically and with simulations, we have demonstrated the importance of flexibly controlling for baseline and timevarying causes of missingness, while rigorously accounting for the dependence of observations within a cluster (e.g. community).
Linked papers

Methods
 SEARCH trial applications
Cluster Randomized Trials to Translate Research into Practice
In cluster randomized trials (CRTs), groups of individuals (e.g., communities or clinics) are randomly assigned to treatment arms. CRTs are often pragmatic studies in that the focus is on assessing comparative effectiveness in realworld settings. We tackle key questions in the design and analysis of these trials. In particular, we have demonstrated the gains in efficiency, power, and interpretation from pairmatching over complete randomization, targeting the sample effect instead of a population average parameter, and dataadaptive adjustment through a prespecified analysis.