Large scale inference under sparse and weak alternatives: non-asymptotic phase diagram for CsCsHM statistics

University essay from KTH/Matematisk statistik

Author: Oskar Stattin; [2017]

Keywords: ;

Abstract: High-throughput measurement technology allows to generate and store huge amounts of features, of which very few can be useful for any one single problem at hand. Examples include genomics, proteomics and astronomy, where massive multiple testing often needs to be per- formed, expecting a few significant effects and essentially a null back- ground. A number of new test procedures have been developed for detecting these, so-called sparse and weak effects, in large scale statistical inference. The most widely used is Higher Criticism, HC (see e.g. Donoho and Jin (2004)). A new class of goodness-of-fit test statistics, called CsCsHM, has recently been derived (see Stepanova and Pavlenko (2017)) for the same type of multiple testing, it is shown to achieve better asymptotic properties than the traditional HC approach.This report empirically investigates the behavior of both test procedures in the neighborhood of the detection boundary, i.e. the threshold for the detectability of sparse and weak effects. This theoretical boundary sharply separates the phase space, spanned by the sparsity and weakness parameters, into two subregions the region of detectability and the region of undetectability. The statistics are also applied and compared for both methodologies for features selection in high dimensional binary classification problems. Besides the study of the methods and simulations, applications of both methods on realistic data are carried out. It is found that the statistics are comparable in performance accuracy. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)