Developing an Advanced Method for Kinship from Ancient DNA Data

University essay from Uppsala universitet/Institutionen för biologisk grundutbildning

Author: Erkin Alacamli; [2023]

Keywords: aDNA; ancient DNA; kinship estimation;

Abstract:  The analysis of kinship from ancient DNA (aDNA) data has the potential to provide insight into social structures of prehistoric societies. Kinship analysis is gaining popularity as optimised wet-lab methods allow for studies with sample sizes on the level of whole cemeteries. However, the specifics of ancient DNA require different methods than what would be used for modern DNA. A common way is to use the sites that are identical-bydescent (IBD), however, detecting these is often a challenging task since it is not easy to determine whether a shared locus between two individuals is inherited from the ancestor or if another factor caused the similarity. Most methods used in the field are able to identify up to 2nd or 3rd degree relatives from aDNA data but do not distinguish between different types of relationship for the same degree, for instance not being able to differentiate between parentoffspring and full sibling-sibling relationship in first degree. The aDNA kinship methods often use either of window-based or single-site approaches, however, these two approaches have not been compared formally before in terms of effectivity and efficiency. In this work, READv2 is presented as a re-implementation of a popular kinship analysis method for aDNA studies with additional features such as accepting .bed files as input, which take up less space than the previous input type, plain-text .tped files. It is shown that the new version works more efficiently in terms of runtime. However, the memory requirements seem to be increased with the new implementation. Furthermore, a window-based approach is compared with the single-site approach of READv2, as well as varying window sizes, with benchmarked simulation data which contains approximately 700 individuals with known 1st degree, 2nd degree and 3rd degree relationships. According to the comparison, the sensitivity of the method does not vary between the approaches and different window sizes for high coverages. However, the single-site approach has been shown to be the superior one by a small margin for lower coverages. In addition to these, using the variance of non-shared alleles in windows along the genome has been used to implement a method to differentiate different first-degree relationships, parent-offspring and siblings. The method is tested with an independent dataset from the 1000 Genomes Project which shows that the proposed method is able to work with different datasets with varying sets of SNPs. Nevertheless, the first-degree classification method requires further analyses to determine the stress-point where the True Positive rates for both categories start to drop. Additionally, some necessary changes and decisions are required for READv2 to be a user-friendly method that can be used by other researchers. The preliminary release of READv2, including example data as well as instructions to install the necessary packages and to run the algorithm can be found in https://github.com/GuntherLab/READv2/releases/tag/READ. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)