Correlation coefficient based feature screening : With applications to microarray data

University essay from Umeå universitet/Statistik

Author: Agnes Holma; [2022]

Keywords: ;

Abstract: Measuring dependency between variables is of great importance when performing statistical analysis and can for instance be used for feature screening. Therefore, it is interesting to find measures that can quantify the dependencies, even if the dependencies are complex. Recently, the correlation coefficient ξn was proposed [1], that is fast to compute and works particularly well when dependencies show an oscillatory or wiggly pattern. In this thesis, the coefficient ξn was applied as a feature screening tool, and it was investigated how well the coefficient could find the dependencies between predictor variables and a response variable in a comprehensive simulation study. The result showed that the correlation coefficient ξn was better, compared to two other quite new and popular correlation coefficients, Hilbert-Schmidt Independence Criterion and Distance Correlation (DC), at detecting the dependencies when variables were connected through sinus-or cosinus-functions and worse when variables were connected through some other functions, such as exponential functions. As a feature screening tool, the correlation coefficient ξn and DC was also applied to real microarray data to investigate if it could give better results than when using t-test for feature screening. The result showed that using t-test was more efficient than using DC or ξn for this particular data set.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)