Effect of ascertainment bias on calculations of sex-biased admixture in Southern Africa

University essay from Uppsala universitet/Institutionen för biologisk grundutbildning

Abstract: Southern African populations harbour great genetic diversity enhanced by  population migration to the area in the last two millennia. Africa is perhaps the least studied continent in regards to population genetics and is often underrepresented in global studies. Studying sex-biased admixture in admixed populations is a great tool to understand population demographic history as well as sex-biased admixture from past events. Various studies on sex-biased admixture in Southern Africa have shown male sex-biased admixture from the incoming Bantu-speaking populations. One study by Hollfelder (2018) shows female Bantu-speaking sex-biased admixture. Here I will try to determine if ascertainment bias is the cause of the unexpected results in Hollfelder (2018). I will do this by comparing the original results, genotyped using the Illumina Omni 2.5M Array, to overlapping SNPs in two different arrays, the Affymetrix Human Origin Array and the Infinium H3Africa Consortium Array. Additionally, I will use whole genome data containing same individuals and individuals from similar populations to form a hypothesis on how the sex-biased admixture should look like without ascertainment. Then extracting variants from the whole genome data to two array SNP panels, the Illumina 2.5M Array and the Infinium H3Africa Consortium Array. For both parts in my project a method by Goldberg and Rosenberg (2015) will be used to calculate female and male contribution from admixture proportions of the X-chromosome and the autosomes estimated using the software ADMIXTURE. The results obtained could not determine if ascertainment bias was the sole factor skewing the results. The overlap with the Affymetrix Human Origin Array showed results closest to expected results based on previous studies, suggesting that ascertainment bias likely affects the results. The results attained using the whole genome indicated that the genotype calls of individuals present in both parts of the study did not fully match and that was confirmed using a principal component analysis. Unfortunatly the data used and analytical limitations in this project did not yield answers to how ascertainment bias affects calculations on sex-biased admixture. The X-chromosome is difficult to work with, especially when using data from multiple publications, as there is no standard common best-practice pipeline available on how to process the data leading to different data sets having been treated differently, which possibly affects downstream analysis when combining data sets.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)