|Year : 2018 | Volume
| Issue : 3 | Page : 115-121
The evaluation of insertion and deletion polymorphism in population and personal identification amidst Chinese populations
Hui Sun1, Caiyong Yin2, Lei Shang1, Chong Wang1, Kaiyuan Su3, Wanshui Li1, Feng Chen4, Shilin Li3
1 Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
2 Department of Forensic Medicine, Nanjing Medical University, Nanjing, 210029; MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200438; Fudan-Taizhou Institute of Health Sciences, Jiangsu, 225300, China
3 MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200438; Fudan-Taizhou Institute of Health Sciences, Jiangsu, 225300, China
4 Department of Forensic Medicine, Nanjing Medical University, Nanjing, 210029, China
|Date of Web Publication||28-Sep-2018|
Dr. Feng Chen
Department of Forensic Medicine, Nanjing Medical University, Nanjing 210029
Dr. Shilin Li
MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200438, China; Fudan-Taizhou Institute of Health Sciences, Taizhou 225300
Dr. Wanshui Li
Institute of Forensic Science, Ministry of Public Security, Beijing 100038
Source of Support: None, Conflict of Interest: None
For comprehensive understanding of practical application and evaluation on the power of 30 commonly used InDels (Qiagen Investigator DIPplex® kit), we captured population data from 25 Chinese populations and employed F-statistics for population genetics analysis. The results indicated that the distributions of allelic frequencies among populations were in different levels. Furthermore, the phylogeny conforming pairwise FSTdistances showed that the differentiation of majority populations were consistent with their geographic locations and historic dispersals. We conduct the comprehensive correlation analysis between FSTand heterozygosity of 30 InDel loci and provided strong evidence for ongoing InDel loci selection. The FSTvalues of 30 InDels were calculated within 25 Chinese populations, and then, these loci were characterized definitely based on their roles in population genetics or individual identification. Data indicated that 17 InDels with FST<0.01 could be utilized regarding Chinese individual identification (total discrimination power = 0.999985 and cumulative matching probability = 0.00000009). We comprehensively reconstructed the population structure and filled the gap of evaluating the ability of InDels in personal as well as population identification. The application of InDel loci in the forensic area would convincingly promote the development matter of forensic population identification and personal discrimination.
Keywords: Chinese populations, insertion and deletion, personal Identification, population identification, Qiagen investigator dipplex® kit
|How to cite this article:|
Sun H, Yin C, Shang L, Wang C, Su K, Li W, Chen F, Li S. The evaluation of insertion and deletion polymorphism in population and personal identification amidst Chinese populations. J Forensic Sci Med 2018;4:115-21
|How to cite this URL:|
Sun H, Yin C, Shang L, Wang C, Su K, Li W, Chen F, Li S. The evaluation of insertion and deletion polymorphism in population and personal identification amidst Chinese populations. J Forensic Sci Med [serial online] 2018 [cited 2019 Feb 17];4:115-21. Available from: http://www.jfsmonline.com/text.asp?2018/4/3/115/242513
Hui Sun and Caiyong Yin contributed equally to this work
| Introduction|| |
Qiagen Investigator DIPplex ® kit (Qiagen, Hilden, Germany), based on biallelic InDel assay, was the first commercial forensic kit aiming for analyzing challenging forensic biology samples. It had attracted a great attention on forensic DNA laboratories worldwide ,,,,,,,,,, due to its unique advantageousness of availability by capillary electrophoresis and interpretation according to production size. Despite the technical convenience, the role of the 30-InDel panel in population differentiation was issued. A population research involving seven Chinese subpopulations, Wang et al. pinpointed that the panel could not differentiate these groups explicitly. Yang et al. found that two Uyghur groups were observed deviating from the East Asian groups. These anomalous observations call for a fundamental evaluation on these 30 InDels in Chinese population differentiation.
Wright's F-statistics, renewed in 1984, played cardinal roles in estimating genetic differentiation coefficient (FST), inbreeding coefficient (FIT), and inner population inbreeding coefficient (FIS). In this research, considering the large population size of Chinese, we examined 30 InDels' abilities in identifying Chinese populations and individuals. As previously suggested that some InDels derived with extremely low FST were not suitable for ancestry inference, they actually could be screened for personal identification. In this study, the FST values of 30 InDels were calculated within 25 Chinese populations and then utilized as evidence indicating biased effects of 30 InDels in population or individual identification. Apparently, we achieved the explicit profiles of 30 InDels in the applications of population genetics or forensic practice purpose.
| Materials and Methods|| |
All population data at the level of individual genotyping were collected [Table 1], and the geographic details of 25 Chinese populations (4021 samples) were introduced. All data used are published public data for which the consent was exempted. Seven Chinese Han and 18 Chinese ethnic groups were incorporated. The geographic locations of these 25 groups were marked in [Figure 1]. The genotyping information (presence or absence) was collected for further data processing. These were approved by the Institutional Ethics Committee.
|Figure 1: The geographic location of collected 25 Chinese populations. The populations listed on the right side coincided with those in Table 1|
Click here to view
The pairwise FST genetic distances as well as pertinent significances were calculated using software arlequin (version 3.0) Swiss Institute of Bioinformatics, Computational and Molecular Population Genetics Lab (CMPG), Institute of Ecology and Evolution, University of Berne, Baltzerstrasse 6, 3012 Bern, Switzerland. The Pearson correlation analysis was performed by R version 3.3.1. The “adegenet” was utilized for performing discriminant analysis of principal components (DAPC) as Jombart et al. described and generating the expected heterozygosity (He) of 30 InDels. The FIS, FIT, and FST of 30 InDels were estimated by “pegas.” To demonstrate the outer genetic affinities among these populations, structure program (version 2.2) was used to estimate the membership coefficients with 60,000 iterations and 120,000 buin-ins cycles under the admixture model. In addition, the determination of the K value  was confirmed to the generated Ln Pr (X|K) and Delta K. The neighbor-joining (NJ) tree reconstruction was based on Mega 6.0. The “pheatmap” helped to specify the changes in allelic frequencies of 30 InDels and calculated pairwise FST distances while “easyGgplot2” generated the boxplot.
| Results and Discussions|| |
Population structure of selected groups
In [Figure 2], the population structure was intuitively illustrated. The allelic frequencies and further hierarchical clustering based on the Euclidean distances elaborated the high diversities among Chinese populations [Figure 2]a. Apparently, only small deviations were observed regarding the allelic frequencies at locus D118 which stayed at extreme high level (all >0.8) in Chinese Sino-Tibetan populations but generally at average levels in Chinese Altaic groups. Similar notable differences were also manifested (loci D6, D40, and D6).
|Figure 2: The population structure of 25 populations (a) Heat map based on allelic frequencies of 30 InDels indicated by the number in the grids. Furthermore, the Euclidean distance-induced hierarchical clustering was illustrated; (b) The pairwise FST values [Supplementary Table 1] were visualized by a heat map; and (c) The neighbor-joining tree was reconstructed according to the pairwise FST values)|
Click here to view
Then, the pairwise comparisons and NJ tree based on FST genetic distances [Supplementary Table 1] and [Figure 2]b and [Figure 2]c were performed, indicating that explicit relationships among 25 Chinese groups could be dissected utilizing 30 InDel loci. All seven Han subpopulations clustered together in the terminal of phylogenetic tree, and then with Yi1, Bai, and Tujia groups which were sampled in Chines Yunnan Province. Besides, four Tibetan groups and three Uyghur subpopulations were in the same branch, respectively. Nonetheless, all groups belonging to Altaic linguistic family excluding Xibe ethnic group had great FST genetic distances with groups of Sino-Tibetan language affinity [Figure 2]b. Some interrelationships were contradicted to common understandings in population genetics, for example, Xibe group which belongs to Altaic language family, was close to Sino-Tibetan groups, and departing from other four Altaic groups; besides, Yi1 and Yi2 groups were from divergent branches, and the inter relationships among seven Han groups were not in consistence with their relatively geographic locations. The preliminary observations on population discrimination supported that 30 InDels possessed the ability of reconstructing the majority of Chinese populations.
Divergence between FST and heterozygosity
The population substructure effects were evaluated utilizing Wright's FST. [Table 2] exhibited the parameters of 30 InDels with relations to F statistics and heterozygosity among 25 Chinese populations. The correlation analysis between FST and He [Figure 3] demonstrated that in the scale of populations worldwide or affiliated to China, significantly negative changes of FST were observed with ascending He values. This provided robust evidence that albeit with different levels of heterozygosity, the observation of substructure differentiation based on 30 InDels changed regularly. Through analyzing the 25 Chinese population included in the present study, we found 17 of 30 InDels with He >0.4 had low FST values (<0.01). It is interesting that D40 harbored both high FST and He values, which might be caused by the extreme low frequencies of this locus in Yi1, Tujia, and Tibetan3 groups [Figure 2]a.
|Table 2: The FIT, FIS, FST values, and observed and expected heterozygosities of 30 InDels among 25 Chinese groups|
Click here to view
|Figure 3: The correlation analysis between FST and He. Each red dot represented one InDel locus. FST was strongly correlated with He in Chinese groups|
Click here to view
We built this model to provide references when screening specialized sets of InDels for different perspectives in forensic field with the purpose of personal or population identification. The model based on FST and He indicated that most parts of InDels with high He values (>0.4) were suitable for personal identification while the rest were qualified for being employed as ancestry informative markers. Of course, some InDels with extreme values should arouse enough attentions, as obvious disparity existed their frequency in different populations. Hence, when performing personal identification with InDels possessing high He, a correct reference dataset in certain population should be determined in advance. As followed, we further discussed the performance of different featured InDels in subpopulation differentiation and personal identification within Chinese.
Chinese subpopulation differentiation
In this research, a total of 25 Chinese populations were included in the study. The discriminability of 30 InDels at the population level and individual level among Chinese groups was determined [Figure 4]b and [Figure 4]c. Sino-Tibetan and Altaic linguistic families harboring the major Chinese ethnic groups and thus were employed as grouping warrant. The DAPC based on 30 loci illustrated complex genetic affinities with each other, indicating frequent gene exchanges among Chinese populations [Figure 4]a. Among five Altaic groups, only Xibe group was genetically close to Sino-Tibetan groups and Kazakh together with three Uyghur groups departed from Sino-Tibetans. This observation was confirmed to the NJ tree [Figure 2]c. Divided by FST= 0.01 according to Li et al., we selected 17 InDel loci with FST < 0.01 [Table 2]. The most appropriate K was determined as K = 2 [Supplementary Figure 1]. The combination of DAPC and STRUCTURE [Figure 4] and [Figure 5] supported that these 17 InDel loci were eligible for identifying Chinese individuals. The nuanced differences among Chinese populations were not easy to be detected; however, using 13 InDels with FST>0.01 helped to uncover the genetic affinities and deviations among these 25 Chinese populations. In addition, the forensic parameters (Torsades des pointes = 0.999985 and CMP = 0.00000009) of 17 InDels indicated that these loci could be utilized as the supplementary tool for forensic application. Qiagen investigator DIPplex ® kit was qualified for a forensic biological tool. Furthermore, more InDels should be screened with their population traits being illustrated for their potential values in both population and personal identification in Chinese populations.
|Figure 4: Discriminant analysis of principal components for 4021 individuals from 25 Chinese populations. The numerical tag (1-25) indicated all selected Chinese populations (1: ChengduHan; 2:HenanHan; 3: ZhejiangHan; 4: BeijingHan1; 5: BeijingHan2; 6: GuangdongHan; 7: ShanghaiHan; 8: Dong; 9: Miao; 10: Bai; 11: Tibet Tibetan; 12: Qinghai Tibetan; 13: Tibetan1; 14: Tibetan2; 15: Tibetan3; 16: She; 17: Zhuang; 18: Tujia; 19: Yi1; 20: Yi2; 21: Kazakh; 22: Uyghur1; 23: Uyghur2; 24:Uyghur3; 25:Xibe). All analyzed based on first two components. Divided by FST = 0.01, 13 and 17 InDels were dissected in patterns of Chinese population (indicated by number) or individual identifications (indicated by spots). (a). When all 30 InDel loci were included, the population structure was heterogeneous evidenced by Tibetan3, She, and Zhuang populations clustered together, Kazakh, Uyghur1, Uyghur2, and Uyghur3 groups made up another cluster, while the other 18 groups aggregated. (b). When all 13 (FST < 0.01) InDel loci were included, the population structure resembled that in Fig.4a symmetrically. (c). When all 17 (FST > 0.01) InDel loci were included, a homogeneous population structure was observed|
Click here to view
|Figure 5: Structure analysis for 4021 individuals from 25 Chinese populations. The K = 2 was selected according to the mean Ln probability tests in Supplementary Figure 1, demonstrating two inferred clusters. White indicated the ancestry proportion of Altaic linguistic groups, while blue indicated that of Sino-Tibetan linguistic groups. (a). The analysis based on 30 InDel loci demonstrated an unproportioned composition of two inferred clusters between Sino-Tibetan and Altaic linguistic groups, however, some populations such as Tibetan, Yi, and Xibe showed more uniform compositions. (b) The structure by 13 InDels (FST < 0.01) showed patterns similar to that in Figure 5a. (c). The structure by 17 InDels (FST > 0.01) indicated uniform composition among all 25 Chinese populations|
Click here to view
| Conclusion|| |
In this study, we found that the Chinese subpopulation structure explained by 30 InDels was homogeneous. The combination of loci with high FST exerted similar effects in population identity. Within 25 Chinese populations, the cutoff of FST= 0.01 split 30 InDels into 13 (FST>0.01) and 17 loci (FST<0.01). About 13 InDels (FST>0.01) were suitable for Chinese population discrimination, and 17 InDels (FST<0.01) reflected population homogeneity of high level and thus could be utilized for intercontinental personal identification. Our efforts reconstructed the Chinese population structure comprehensively and filled the gap of evaluating the ability of investigator DIPplex kit in population identification broadly. For the first time, we conducted the comprehensive correlation analysis between FST and He of 30 InDel loci in investigator DIPplex kit and provided strong evidence for ongoing InDel loci searching.
Financial support and sponsorship
This study was financially supported by The Fundamental Research Funds for the Central Research Institutes with project number “2017JB004.”
Conflicts of interest
There are no conflicts of interest.
| References|| |
LaRue BL, Ge J, King JL, Budowle B. A validation study of the Qiagen investigator DIPplex® kit; an INDEL-based assay for human identification. Int J Legal Med 2012;126:533-40.
Pepinski W, Abreu-Glowacka M, Koralewska-Kordel M, Michalak E, Kordel K, Niemcunowicz-Janica A, et al.
Population genetics of 30 INDELs in populations of Poland and Taiwan. Mol Biol Rep 2013;40:4333-8.
Shen C, Zhu B, Yao T, Li Z, Zhang Y, Yan J, et al.
A30-inDel assay for genetic variation and population structure analysis of Chinese Tujia group. Sci Rep 2016;6:36842.
Wang L, Lv M, Zaumsegel D, Zhang L, Liu F, Xiang J, et al.
Acomparative study of insertion/deletion polymorphisms applied among Southwest, South and Northwest Chinese populations using investigator(®) DIPplex. Forensic Sci Int Genet 2016;21:10-4.
Wei YL, Qin CJ, Dong H, Jia J, Li C ×. A validation study of a multiplex INDEL assay for forensic use in four Chinese populations. Forensic Sci Int Genet 2014;9:e22-5.
Saiz M, André F, Pisano N, Sandberg N, Bertoni B, Pagano S, et al.
Allelic frequencies and statistical data from 30 INDEL loci in Uruguayan population. Forensic Sci Int Genet 2014;9:e27-9.
Zhang YD, Shen CM, Jin R, Li YN, Wang B, Ma LX, et al.
Forensic evaluation and population genetic study of 30 insertion/deletion polymorphisms in a Chinese Yi group. Electrophoresis 2015;36:1196-201.
Martínez-Cortés G, García-Aceves M, Favela-Mendoza AF, Muñoz-Valle JF, Velarde-Felix JS, Rangel-Villalobos H, et al.
Forensic parameters of the investigator DIPplex kit (Qiagen) in six Mexican populations. Int J Legal Med 2016;130:683-5.
Hefke G, Davison S, D'Amato ME. Forensic performance of investigator DIPplex indels genotyping kit in native, immigrant, and admixed populations in South Africa. Electrophoresis 2015;36:3018-25.
Kis Z, Zalán A, Völgyi A, Kozma Z, Domján L, Pamjav H, et al.
Genome deletion and insertion polymorphisms (DIPs) in the Hungarian population. Forensic Sci Int Genet 2012;6:e125-6.
Friis SL, Børsting C, Rockenbauer E, Poulsen L, Fredslund SF, Tomas C, et al.
Typing of 30 insertion/deletions in Danes using the first commercial indel kit – Mentype® DIPplex. Forensic Sci Int Genet 2012;6:e72-4.
Seong KM, Park JH, Hyun YS, Kang PW, Choi DH, Han MS, et al.
Population genetics of insertion-deletion polymorphisms in South Koreans using investigator DIPplex kit. Forensic Sci Int Genet 2014;8:80-3.
Yang CH, Yin CY, Shen CM, Guo YX, Dong Q, Yan JW, et al.
Genetic variation and forensic efficiency of autosomal insertion/deletion polymorphisms in Chinese Bai ethnic group: Phylogenetic analysis to other populations. Oncotarget 2017;8:39582-91.
Weir BS, Cockerham CC. Estimating F-Statistics for the analysis of population structure. Evolution 1984;38:1358-70.
Shi M, Liu Y, Bai R, Jiang L, Lv X, Ma S, et al.
Population data of 30 insertion-deletion markers in four Chinese populations. Int J Legal Med 2015;129:53-6.
Liu X, Chen F, Niu Y, Bian Y, Zhang S, Zhu R, et al.
Population genetics of 30 insertion/deletion polymorphisms in Han Chinese population from Zhejiang province. Forensic Sci Int Genet 2017;28:e33-e35.
Li H, Xiaoguang W, Sujuan L, Yinming Z, Xueling O, Yong C, et al
. Genetic polymorphisms of 30 indel loci in Guangdong Han population. J Sun Yat Sen Univ (Med Sci) 2013;34:6.
Wang Z, Zhang S, Zhao S, Hu Z, Sun K, Li C, et al.
Population genetics of 30 insertion-deletion polymorphisms in two Chinese populations using Qiagen investigator® DIPplex kit. Forensic Sci Int Genet 2014;11:e12-4.
Guo Y, Shen C, Meng H, Dong Q, Kong T, Yang C, et al.
Population differentiations and phylogenetic analysis of tibet and Qinghai Tibetan groups based on 30 inDel loci. DNA Cell Biol 2016;35:787-94.
Meng HT, Zhang YD, Shen CM, Yuan GL, Yang CH, Jin R, et al.
Genetic polymorphism analyses of 30 InDels in Chinese xibe ethnic group and its population genetic differentiations with other groups. Sci Rep 2015;5:8260.
Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evol Bioinform Online 2007;1:47-50.
Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: A new method for the analysis of genetically structured populations. BMC Genet 2010;11:94.
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000;155:945-59.
Earl DA, vonHoldt BM. Structure harvester: A website and program for visualizing structure output and implementing the Evanno method. Conserv Genet Resour 2012;4:3.
Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Mol Ecol 2005;14:2611-20.
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 2013;30:2725-9.
Kane D. The Chinese language : its history and current usage[J]. North Clarendon Vt Tuttle Publishing; 2006.
Li L, Wang Y, Yang S, Xia M, Yang Y, Wang J, et al.
Genome-wide screening for highly discriminative SNPs for personal identification and their assessment in world populations. Forensic Sci Int Genet 2017;28:118-27.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
[Table 1], [Table 2]