|Year : 2016 | Volume
| Issue : 4 | Page : 229-232
The Consistencies of Y-Chromosomal and Autosomal Continental Ancestry Varying among Haplogroups
Chuan-Chao Wang1, Lei Shang2, Hui-Yuan Yeh3, Lan-Hai Wei4
1 Department of Genetics, Harvard Medical School, Boston, MA, USA; Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
2 Key Laboratory of Forensic Genetics, Institute of Forensic Science, Ministry of Public Security, Beijing, China
3 School of Humanities and Social Sciences, Nanyang Technological University, Singapore
4 State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences, Fudan University, Shanghai, China
|Date of Web Publication||9-Jan-2017|
Department of Genetics, Harvard Medical School, Boston, MA, USA; Department of Archaeogenetics, Max Planck Institute for the Science of Human History, Jena, Germany
Source of Support: None, Conflict of Interest: None
The Y-chromosome has been widely used in ancestry inference based on its region-specific haplogroup distributions. However, there is always a debate on how informative such a single marker is for inferring an individual's genetic ancestry. Here, we compared genetic ancestry inferences at continental level made by Y-chromosomal haplogroups to those made by autosomal single-nucleotide polymorphisms in 1230 samples of Affymetrix Human Origins dataset. The highest ancestry proportions of a majority of individuals match the highest average continental-ancestry proportions in haplogroups A, B, D, H, I, K, L, T, O, and M. The high consistencies have not been observed in haplogroups E, C, G, J, N, Q, and R, but in some of their sublineages, such as E1a, E1b1a1, E1b1b1b1a, E2b1a, J1a2b, Q1a1a1, Q1a2a1a1, R1b1a2a1a, and R2. Although the consistencies of Y-chromosomal and autosomal continental ancestry vary among haplogroups, Y-chromosome could provide valuable clues for individual's continental ancestry.
Keywords: Ancestry inference, autosomal single-nucleotide polymorphism, Y-chromosome
|How to cite this article:|
Wang CC, Shang L, Yeh HY, Wei LH. The Consistencies of Y-Chromosomal and Autosomal Continental Ancestry Varying among Haplogroups. J Forensic Sci Med 2016;2:229-32
|How to cite this URL:|
Wang CC, Shang L, Yeh HY, Wei LH. The Consistencies of Y-Chromosomal and Autosomal Continental Ancestry Varying among Haplogroups. J Forensic Sci Med [serial online] 2016 [cited 2020 Nov 26];2:229-32. Available from: https://www.jfsmonline.com/text.asp?2016/2/4/229/197925
| Introduction|| |
With the advantages of lack of recombination, strict paternal inheritance, small effective population size, low mutation rate, sufficient markers, and population-specific haplotype distribution, Y-chromosome has been widely used in anthropology, population genetics, and forensic genetics to understand population genetic structure, population history, and forensic identifications. Y-chromosome has also inspired widespread public interest to trace paternal ancestors and been commercially used by many companies. A very famous example is the Y-chromosomal type of Genghis Khan, which was supposed to belong to the “star cluster” under haplogroup C3*-M217 (xM48) and has gained extensive attention and attracted numerous consumers to get tested. However, as Y-chromosome is only a single marker and suffers from severe genetic drift, such simple ancestry analyses tend to overlook the contribution of the vast majority of an individual's ancestors to his/her genome.,
There are also many alternative ancestry inference methods, such as testing mitochondrial DNA (mtDNA), genome-wide short tandem repeat (STR) or single-nucleotide polymorphism (SNP), and ancestry informative markers (AIMs). The mtDNA is maternally inherited and has been widely used to trace maternal history. Genome-wide STRs, SNPs, and AIMs are usually applied to inferring a detailed composition of an individual's ancestry. However, some recent genome-wide studies have revealed frequent discrepancies between ancestry inferences using mtDNA versus autosomal SNPs., The mtDNA case reminds us to rethink how much ancestry information that Y-chromosome could give and the accuracy of Y-chromosomal ancestry inference compared to that of genome-wide ancestry estimation. Here, we presented a comprehensive analysis using Y-chromosomal and genome-wide autosomal SNP data of more than 1200 male individuals from Affymetrix Human Origins dataset  to directly and quantitatively assess the consistency of Y-chromosomal and autosomal continental ancestry.
| Materials and Methods|| |
The Y-chromosomal and autosomal genotype data for 1230 male individuals were extracted from Affymetrix Human Origins dataset  using EIGENSOFT  and PLINK. Y-chromosomal haplogroups were classified based on the International Society of Genetic Genealogy phylogenetic tree at January 28, 2015 (http://www.isogg.org/). We used ADMIXTURE v. 1.23 to estimate ancestry proportions for 1230 males with 594,924 autosomal SNPs. Each run involved 100 replicates with different random starting seeds, default 5-fold cross-validation, and varying the number of ancestral populations K from 2 to 12. At K = 8, the samples were well assigned to eight continental regions: Africa, Middle East, Europe, South Asia, East Asia, Siberia, Oceania, and Americas. The average continental-ancestry proportions within each Y-chromosomal haplogroup, standard deviation (SD) of individual continental-ancestry percentages for each continental region in each haplogroup, mean pairwise Euclidean distance (d) within each haplogroup, and consistency scores were all calculated according to Emery et al. The graphical displays for ancestry plot were carried out in R statistical software v3.0.2.
| Results|| |
The Human Origin dataset contained male samples of worldwide lineages from Y-chromosomal haplogroups A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, Q, R, S, and T [Table S1 [Additional file 1]]. All the haplogroups, except A, B, K, M, and S, were found in more than one continent. Haplogroups A and B were only discovered in Africa whereas K, M, and S were only presented in Oceania. Likewise, haplogroups A and B had predominately African ancestry whereas K, M, and S had predominately Oceanian ancestry [Table S2 [Additional file 2]] and [Figure 1]. Haplogroups C, D, and O were frequent in populations from East Asia. The East Asian ancestry proportions in haplogroups D and O were extremely higher than those of other continents. East Asian and Siberian ancestry seemed to contribute equally to individuals of haplogroup C [Table S2] and [Figure 1]. Haplogroup E reached high frequencies in Africa and Middle East. The African and Middle Eastern ancestries are also the two main components for individuals of haplogroup E. Haplogroups L, H, and R were frequent in South Asia, and R was also found at very high frequency in Europe. Haplogroups L and H had predominately South Asian ancestry. The maximum ancestry proportion of haplogroup R was from Europe, and the second- and third-highest ancestry proportions were from South Asia and Middle East, respectively. The frequencies of haplogroups I, Q, and T were enriched in Europe, Americas, and Middle East, respectively. Similarly, the ancestry proportions of the above three regions also reached highest in haplogroups I, Q, and T, respectively. Haplogroups G, J, and N were found in various regions [Table S1] and their genetic ancestries also varied. Collectively, we found a good correlation between haplogroup frequencies and continental-ancestry proportions.[PDF:1]
|Figure 1: (a) Haplogroup-averaged continental-ancestry proportions; (b) Individual continental-ancestry proportions in the male individuals of Affymetrix Human Origins Dataset|
Click here to view
We then estimated the SD of individual continental-ancestry percentages within each haplogroup. The continental-ancestry proportions varied considerably among individuals in the majority of haplogroups, especially in haplogroups E, N, K, Q, and R (SD > 0.3) [Table S3 [Additional file 3]]. We also calculated the mean pairwise Euclidean distance between continental-ancestry proportions among individuals within each haplogroup, which is a quantitative measure of the inter-individual variability. Consistent with the SD results, the mean pairwise Euclidean distances in haplogroups E, C, K, N, Q, and R were high (>0.5) [Table S4 [Additional file 4]], suggesting that these haplogroups are not very informative for inferring individual's continental ancestry. In contrary, the distance in haplogroups A, B, D, K, O, and M were relatively low [Table S4], indicating a strong association between geographic-ancestry compositions and a certain haplogroup. To directly and quantitatively assess how informative Y-chromosome is in inferring an individual's genetic ancestry, we calculated the consistency score within each haplogroup. The score was the proportion of individuals with continental ancestry >50% matching highest continental-ancestry component of the haplogroup in our dataset. The consistency ranged from 0.333 to 1.000 with a mean of 0.697 in major haplogroups from A to R, and about 65% of Y-chromosomal haplogroups had a consistency score >50% [Table S4]. The high consistency had been observed in haplogroups A, B, D, H, I, K, L, T, O, and M, meaning that these haplogroups could be regarded as having substantial genetic ancestry from their corresponding continents. The consistency values in haplogroups C, G, J, N, Q, and R were relatively low, which make it difficult to assign each of these haplogroups to a certain continent.
The haplogroups with high SD, high Euclidean distance, and low consistency are all continent-wide distributed lineages. However, some of their sublineages are regional specific. For instance, Q1a2a1a1-M3 is almost exclusively distributed in Americas. It is very likely that these sublineages also have exclusive continental ancestry. The frequency of Q1a2a1a1 in our dataset reached 36.2% of all Q individuals. This lineage, with more than 90% of American ancestry and a consistency value of 0.88, could be undisputedly classified into Americas. Haplogroups E1a, E1b1a1, and E2b1a, comprising more than half of all haplogroup E individuals, had exclusive African ancestry with extremely high consistency scores (nearly 1) and low Euclidean distance and SD. In addition, haplogroup E1b1b1b1a, accounting for 12% of all haplogroup E individuals, could be reasonably assigned to Middle East with a consistency value of 0.87. Haplogroup R1b1a2a1a, making up 32.6% of all R individuals, had a strong association with Europe, while R2 samples had substantial South Asian genetic ancestry. Similarly, haplogroup J1a2b was associated with Middle East, and Q1a1a1 could be assigned into East Asia.
| Discussion|| |
We directly compared the genetic ancestry revealed by Y-chromosomal haplogroups to those inferred from genome-wide autosomal SNPs in a worldwide dataset. The continental-ancestry compositions varied among individuals of the same Y-chromosomal haplogroup judging from high SDs. About 70% of the Y-chromosomal haplogroups could be assigned to be associated with certain continents due to the high continent-specific ancestry proportions. The highest ancestry proportions of a majority of individuals match the highest average continental-ancestry proportions in haplogroups A, B, D, H, I, K, L, T, O, and M. Although the high consistencies have not been observed in haplogroups E, C, G, J, N, Q, and R, some of their sublineages, such as E1a, E1b1a1, E1b1b1b1a, E2b1a, J1a2b, Q1a1a1, Q1a2a1a1, R1b1a2a1a, and R2 corresponded well with certain continents. The Y-chromosome seemed like to give higher prediction accuracy for individual ancestries than the mtDNA. This phenomenon might be caused by sex-biased migrations, which refers to a higher female migration rate in human populations. A series of studies have revealed that the among-population components of genetic variation are higher for the Y-chromosome than for the mtDNA, indicating that the Y-chromosomes tend to be more localized geographically.,
The Y-chromosome in a way could provide valuable clues for individual's continental ancestry, but it probably neglected many other detailed ancestry information. One, or at most two, top ancestry components could be well represented by majority of Y-chromosomal haplogroups, whereas other ancestry information is lost. For instance, the highest South Asian ancestry proportions have been detected in individuals of haplogroup H. Meanwhile, East Asia, Europe, and Middle East each have contributed more than 10% of genetic ancestry to many individuals of haplogroup H, which could not be reflected by such a single Y-chromosomal marker. In addition, the Y-chromosomal haplogroup classifications in this study were not very informative. The rough assignment might lose some information of a certain lineage and probably have resulted in bias conclusion. For example, sublineages of haplogroup C have distinct geographic distributions. However, we do not have enough markers in this dataset to identify the detailed phylogeny of haplogroup C individuals, resulting in the inconclusive ancestry inference of this haplogroup.
CCW is supported by the Max Planck Society and Harvard Medical School.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Jobling MA, Tyler-Smith C. The human Y chromosome: An evolutionary marker comes of age. Nat Rev Genet 2003;4:598-612.
Zerjal T, Xue Y, Bertorelle G, Wells RS, Bao W, Zhu S, et al.
The genetic legacy of the Mongols. Am J Hum Genet 2003;72:717-21.
Shriver MD, Kittles RA. Genetic ancestry and the search for personalized genetic histories. Nat Rev Genet 2004;5:611-8.
Emery LS, Magnaye KM, Bigham AW, Akey JM, Bamshad MJ. Estimates of continental ancestry vary widely among individuals with the same mtDNA haplogroup. Am J Hum Genet 2015;96:183-93.
Rosenberg NA, Pritchard JK, Weber JL, Cann HM, Kidd KK, Zhivotovsky LA, et al.
Genetic structure of human populations. Science 2002;298:2381-5.
Lazaridis I, Patterson N, Mittnik A, Renaud G, Mallick S, Kirsanow K, et al.
Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 2014;513:409-13.
Shriver MD, Smith MW, Jin L, Marcini A, Akey JM, Deka R, et al.
Ethnic-affiliation estimation by use of population-specific DNA markers. Am J Hum Genet 1997;60:957-64.
Poetsch M, Wiegand A, Harder M, Blöhm R, Rakotomavo N, Freitag-Wolf S, et al.
Determination of population origin: A comparison of autosomal SNPs, Y-chromosomal and mtDNA haplogroups using a Malagasy population as example. Eur J Hum Genet 2013;21:1423-8.
Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat Rev Genet 2010;11:459-63.
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, et al.
PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007;81:559-75.
Alexander DH, Novembre J, Lange K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res 2009;19:1655-64.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2013.
Seielstad MT, Minch E, Cavalli-Sforza LL. Genetic evidence for a higher female migration rate in humans. Nat Genet 1998;20:278-80.
Lippold S, Xu H, Ko A, Li M, Renaud G, Butthof A, et al.
Human paternal and maternal demographic histories: Insights from high-resolution Y chromosome and mtDNA sequences. Investig Genet 2014;5:13.
Zhong H, Shi H, Qi XB, Xiao CJ, Jin L, Ma RZ, et al.
Global distribution of Y-chromosome haplogroup C reveals the prehistoric migration routes of African exodus and early settlement in East Asia. J Hum Genet 2010;55:428-35.