Classification of microbiome data with structural zeroes and small samples

University essay from Linköpings universitet/Institutionen för datavetenskap

Author: Jun Li; [2021]

Keywords: ;

Abstract: OTU data possesses its uniqueness of small sample size and high proportion of zeros,because of which the classification is of huge challenge. The thesis covers case ofstructural zeros in OTU counts and tries to establish new methodology based on Kaul’sassumption and the original model. It is assumed that normal distributions prevail onlyamong the non-zero taxa, instead of the conventional MAR (missing at random) whichpermits the covariances between zero taxa. Modifications in normalizing OTU counts,shrinking number of OTUs and 3 mutated models of calculating covariances arepresented. To validate the effectiveness of models, a Polish mussel dataset and twosimulated datasets are adopted, where mussel data was collected in three Polish riverswith distinguishable characteristics and simulation models are built to mimic the musseldata and under Kaul’s assumption. After analyzing the results, it is concluded that thesuggested models present stably higher accuracy (without majority rule) than Kaul’smodel for simulation model with Gaussian distribution. The thesis provides newpossibilities to improve classification methods, meanwhile bring about challenges interms of higher demand of resources.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)