Causal discovery in the presence of missing data

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Ruibo Tu; [2018]

Keywords: causal discovery; missing data; PC;

Abstract: Missing data are ubiquitous in many domains such as healthcare. Depending on how they are missing, the (conditional) independence relations in the observed data may be different from those for the complete data generated by the underlying causal process (which are not fully observable) and, as a consequence, simply applying existing causal discovery methods to the observed data may give wrong conclusions. It is then essential to extend existing causal discovery approaches to find true underlying causal structure from such incomplete data. In this thesis, we aim at solving this problem for data that are missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR). With missingness mechanisms represented by the Missingness Graph, we present conditions under which addition corrected to derive conditional independence/dependence relations in the complete data. Combined with the correction method that gives closed-form, consistent tests of conditional independence, the proposed causal discovery method, as an extension of the PC algorithm, is shown to give asymptotically correct results. Experiment results illustrate that with further reasonable assumptions, the proposed algorithm can correct the conditional independence for values MCAR, MAR and rather general cases of values MNAR.

