• Users Online: 97
  • Home
  • Print this page
  • Email this page
Home About us Editorial board Ahead of print Current issue Search Archives Submit article Instructions Subscribe Contacts Login 

 Table of Contents  
Year : 2018  |  Volume : 4  |  Issue : 3  |  Page : 115-121

The evaluation of insertion and deletion polymorphism in population and personal identification amidst Chinese populations

1 Institute of Forensic Science, Ministry of Public Security, Beijing 100038, China
2 Department of Forensic Medicine, Nanjing Medical University, Nanjing, 210029; MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200438; Fudan-Taizhou Institute of Health Sciences, Jiangsu, 225300, China
3 MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200438; Fudan-Taizhou Institute of Health Sciences, Jiangsu, 225300, China
4 Department of Forensic Medicine, Nanjing Medical University, Nanjing, 210029, China

Date of Web Publication28-Sep-2018

Correspondence Address:
Dr. Feng Chen
Department of Forensic Medicine, Nanjing Medical University, Nanjing 210029
Dr. Shilin Li
MOE Key Laboratory of Contemporary Anthropology, Department of Anthropology and Human Genetics, School of Life Sciences, Fudan University, Shanghai, 200438, China; Fudan-Taizhou Institute of Health Sciences, Taizhou 225300
Dr. Wanshui Li
Institute of Forensic Science, Ministry of Public Security, Beijing 100038
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/jfsm.jfsm_24_18

Rights and Permissions

For comprehensive understanding of practical application and evaluation on the power of 30 commonly used InDels (Qiagen Investigator DIPplex® kit), we captured population data from 25 Chinese populations and employed F-statistics for population genetics analysis. The results indicated that the distributions of allelic frequencies among populations were in different levels. Furthermore, the phylogeny conforming pairwise FSTdistances showed that the differentiation of majority populations were consistent with their geographic locations and historic dispersals. We conduct the comprehensive correlation analysis between FSTand heterozygosity of 30 InDel loci and provided strong evidence for ongoing InDel loci selection. The FSTvalues of 30 InDels were calculated within 25 Chinese populations, and then, these loci were characterized definitely based on their roles in population genetics or individual identification. Data indicated that 17 InDels with FST<0.01 could be utilized regarding Chinese individual identification (total discrimination power = 0.999985 and cumulative matching probability = 0.00000009). We comprehensively reconstructed the population structure and filled the gap of evaluating the ability of InDels in personal as well as population identification. The application of InDel loci in the forensic area would convincingly promote the development matter of forensic population identification and personal discrimination.

Keywords: Chinese populations, insertion and deletion, personal Identification, population identification, Qiagen investigator dipplex® kit

How to cite this article:
Sun H, Yin C, Shang L, Wang C, Su K, Li W, Chen F, Li S. The evaluation of insertion and deletion polymorphism in population and personal identification amidst Chinese populations. J Forensic Sci Med 2018;4:115-21

How to cite this URL:
Sun H, Yin C, Shang L, Wang C, Su K, Li W, Chen F, Li S. The evaluation of insertion and deletion polymorphism in population and personal identification amidst Chinese populations. J Forensic Sci Med [serial online] 2018 [cited 2022 Oct 2];4:115-21. Available from: https://www.jfsmonline.com/text.asp?2018/4/3/115/242513

Hui Sun and Caiyong Yin contributed equally to this work

  Introduction Top

Qiagen Investigator DIPplex ® kit (Qiagen, Hilden, Germany), based on biallelic InDel assay, was the first commercial forensic kit aiming for analyzing challenging forensic biology samples.[1] It had attracted a great attention on forensic DNA laboratories worldwide [2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12] due to its unique advantageousness of availability by capillary electrophoresis and interpretation according to production size. Despite the technical convenience, the role of the 30-InDel panel in population differentiation was issued. A population research involving seven Chinese subpopulations, Wang et al.[4] pinpointed that the panel could not differentiate these groups explicitly. Yang et al.[13] found that two Uyghur groups were observed deviating from the East Asian groups. These anomalous observations call for a fundamental evaluation on these 30 InDels in Chinese population differentiation.

Wright's F-statistics, renewed in 1984,[14] played cardinal roles in estimating genetic differentiation coefficient (FST), inbreeding coefficient (FIT), and inner population inbreeding coefficient (FIS). In this research, considering the large population size of Chinese, we examined 30 InDels' abilities in identifying Chinese populations and individuals. As previously suggested that some InDels derived with extremely low FST were not suitable for ancestry inference,[9] they actually could be screened for personal identification. In this study, the FST values of 30 InDels were calculated within 25 Chinese populations and then utilized as evidence indicating biased effects of 30 InDels in population or individual identification. Apparently, we achieved the explicit profiles of 30 InDels in the applications of population genetics or forensic practice purpose.

  Materials and Methods Top

Data capture

All population data at the level of individual genotyping were collected [Table 1], and the geographic details of 25 Chinese populations (4021 samples) were introduced. All data used are published public data for which the consent was exempted. Seven Chinese Han and 18 Chinese ethnic groups were incorporated. The geographic locations of these 25 groups were marked in [Figure 1]. The genotyping information (presence or absence) was collected for further data processing. These were approved by the Institutional Ethics Committee.
Table 1: The summary and citation of selected 25 Chinese populations

Click here to view
Figure 1: The geographic location of collected 25 Chinese populations. The populations listed on the right side coincided with those in Table 1

Click here to view

Statistical analyses

The pairwise FST genetic distances as well as pertinent significances were calculated using software arlequin (version 3.0) Swiss Institute of Bioinformatics, Computational and Molecular Population Genetics Lab (CMPG), Institute of Ecology and Evolution, University of Berne, Baltzerstrasse 6, 3012 Bern, Switzerland.[21] The Pearson correlation analysis was performed by R version 3.3.1. The “adegenet” was utilized for performing discriminant analysis of principal components (DAPC) as Jombart et al.[22] described and generating the expected heterozygosity (He) of 30 InDels. The FIS, FIT, and FST[14] of 30 InDels were estimated by “pegas.” To demonstrate the outer genetic affinities among these populations, structure program (version 2.2)[23] was used to estimate the membership coefficients with 60,000 iterations and 120,000 buin-ins cycles under the admixture model. In addition, the determination of the K value [24] was confirmed to the generated Ln Pr (X|K) and Delta K.[25] The neighbor-joining (NJ) tree reconstruction was based on Mega 6.0.[26] The “pheatmap” helped to specify the changes in allelic frequencies of 30 InDels and calculated pairwise FST distances while “easyGgplot2” generated the boxplot.

  Results and Discussions Top

Population structure of selected groups

In [Figure 2], the population structure was intuitively illustrated. The allelic frequencies and further hierarchical clustering based on the Euclidean distances elaborated the high diversities among Chinese populations [Figure 2]a. Apparently, only small deviations were observed regarding the allelic frequencies at locus D118 which stayed at extreme high level (all >0.8) in Chinese Sino-Tibetan populations but generally at average levels in Chinese Altaic groups. Similar notable differences were also manifested (loci D6, D40, and D6).
Figure 2: The population structure of 25 populations (a) Heat map based on allelic frequencies of 30 InDels indicated by the number in the grids. Furthermore, the Euclidean distance-induced hierarchical clustering was illustrated; (b) The pairwise FST values [Supplementary Table 1] were visualized by a heat map; and (c) The neighbor-joining tree was reconstructed according to the pairwise FST values)

Click here to view

Then, the pairwise comparisons and NJ tree based on FST genetic distances [Supplementary Table 1] and [Figure 2]b and [Figure 2]c were performed, indicating that explicit relationships among 25 Chinese groups could be dissected utilizing 30 InDel loci. All seven Han subpopulations clustered together in the terminal of phylogenetic tree, and then with Yi1, Bai, and Tujia groups which were sampled in Chines Yunnan Province. Besides, four Tibetan groups and three Uyghur subpopulations were in the same branch, respectively. Nonetheless, all groups belonging to Altaic linguistic family excluding Xibe ethnic group had great FST genetic distances with groups of Sino-Tibetan language affinity [Figure 2]b. Some interrelationships were contradicted to common understandings in population genetics, for example, Xibe group which belongs to Altaic language family, was close to Sino-Tibetan groups, and departing from other four Altaic groups; besides, Yi1 and Yi2 groups were from divergent branches, and the inter relationships among seven Han groups were not in consistence with their relatively geographic locations. The preliminary observations on population discrimination supported that 30 InDels possessed the ability of reconstructing the majority of Chinese populations.

Divergence between FST and heterozygosity

The population substructure effects were evaluated utilizing Wright's FST. [Table 2] exhibited the parameters of 30 InDels with relations to F statistics and heterozygosity among 25 Chinese populations. The correlation analysis between FST and He [Figure 3] demonstrated that in the scale of populations worldwide or affiliated to China, significantly negative changes of FST were observed with ascending He values. This provided robust evidence that albeit with different levels of heterozygosity, the observation of substructure differentiation based on 30 InDels changed regularly. Through analyzing the 25 Chinese population included in the present study, we found 17 of 30 InDels with He >0.4 had low FST values (<0.01). It is interesting that D40 harbored both high FST and He values, which might be caused by the extreme low frequencies of this locus in Yi1, Tujia, and Tibetan3 groups [Figure 2]a.
Table 2: The FIT, FIS, FST values, and observed and expected heterozygosities of 30 InDels among 25 Chinese groups

Click here to view
Figure 3: The correlation analysis between FST and He. Each red dot represented one InDel locus. FST was strongly correlated with He in Chinese groups

Click here to view

We built this model to provide references when screening specialized sets of InDels for different perspectives in forensic field with the purpose of personal or population identification. The model based on FST and He indicated that most parts of InDels with high He values (>0.4) were suitable for personal identification while the rest were qualified for being employed as ancestry informative markers. Of course, some InDels with extreme values should arouse enough attentions, as obvious disparity existed their frequency in different populations. Hence, when performing personal identification with InDels possessing high He, a correct reference dataset in certain population should be determined in advance. As followed, we further discussed the performance of different featured InDels in subpopulation differentiation and personal identification within Chinese.

Chinese subpopulation differentiation

In this research, a total of 25 Chinese populations were included in the study. The discriminability of 30 InDels at the population level and individual level among Chinese groups was determined [Figure 4]b and [Figure 4]c. Sino-Tibetan and Altaic linguistic families harboring the major Chinese ethnic groups and thus were employed as grouping warrant.[27] The DAPC based on 30 loci illustrated complex genetic affinities with each other, indicating frequent gene exchanges among Chinese populations [Figure 4]a. Among five Altaic groups, only Xibe group was genetically close to Sino-Tibetan groups and Kazakh together with three Uyghur groups departed from Sino-Tibetans. This observation was confirmed to the NJ tree [Figure 2]c. Divided by FST= 0.01 according to Li et al.,[28] we selected 17 InDel loci with FST < 0.01 [Table 2]. The most appropriate K was determined as K = 2 [Supplementary Figure 1]. The combination of DAPC and STRUCTURE [Figure 4] and [Figure 5] supported that these 17 InDel loci were eligible for identifying Chinese individuals. The nuanced differences among Chinese populations were not easy to be detected; however, using 13 InDels with FST>0.01 helped to uncover the genetic affinities and deviations among these 25 Chinese populations. In addition, the forensic parameters (Torsades des pointes = 0.999985 and CMP = 0.00000009) of 17 InDels indicated that these loci could be utilized as the supplementary tool for forensic application. Qiagen investigator DIPplex ® kit was qualified for a forensic biological tool. Furthermore, more InDels should be screened with their population traits being illustrated for their potential values in both population and personal identification in Chinese populations.
Figure 4: Discriminant analysis of principal components for 4021 individuals from 25 Chinese populations. The numerical tag (1-25) indicated all selected Chinese populations (1: ChengduHan; 2:HenanHan; 3: ZhejiangHan; 4: BeijingHan1; 5: BeijingHan2; 6: GuangdongHan; 7: ShanghaiHan; 8: Dong; 9: Miao; 10: Bai; 11: Tibet Tibetan; 12: Qinghai Tibetan; 13: Tibetan1; 14: Tibetan2; 15: Tibetan3; 16: She; 17: Zhuang; 18: Tujia; 19: Yi1; 20: Yi2; 21: Kazakh; 22: Uyghur1; 23: Uyghur2; 24:Uyghur3; 25:Xibe). All analyzed based on first two components. Divided by FST = 0.01, 13 and 17 InDels were dissected in patterns of Chinese population (indicated by number) or individual identifications (indicated by spots). (a). When all 30 InDel loci were included, the population structure was heterogeneous evidenced by Tibetan3, She, and Zhuang populations clustered together, Kazakh, Uyghur1, Uyghur2, and Uyghur3 groups made up another cluster, while the other 18 groups aggregated. (b). When all 13 (FST < 0.01) InDel loci were included, the population structure resembled that in Fig.4a symmetrically. (c). When all 17 (FST > 0.01) InDel loci were included, a homogeneous population structure was observed

Click here to view

Figure 5: Structure analysis for 4021 individuals from 25 Chinese populations. The K = 2 was selected according to the mean Ln probability tests in Supplementary Figure 1, demonstrating two inferred clusters. White indicated the ancestry proportion of Altaic linguistic groups, while blue indicated that of Sino-Tibetan linguistic groups. (a). The analysis based on 30 InDel loci demonstrated an unproportioned composition of two inferred clusters between Sino-Tibetan and Altaic linguistic groups, however, some populations such as Tibetan, Yi, and Xibe showed more uniform compositions. (b) The structure by 13 InDels (FST < 0.01) showed patterns similar to that in Figure 5a. (c). The structure by 17 InDels (FST > 0.01) indicated uniform composition among all 25 Chinese populations

Click here to view

  Conclusion Top

In this study, we found that the Chinese subpopulation structure explained by 30 InDels was homogeneous. The combination of loci with high FST exerted similar effects in population identity. Within 25 Chinese populations, the cutoff of FST= 0.01 split 30 InDels into 13 (FST>0.01) and 17 loci (FST<0.01). About 13 InDels (FST>0.01) were suitable for Chinese population discrimination, and 17 InDels (FST<0.01) reflected population homogeneity of high level and thus could be utilized for intercontinental personal identification. Our efforts reconstructed the Chinese population structure comprehensively and filled the gap of evaluating the ability of investigator DIPplex kit in population identification broadly. For the first time, we conducted the comprehensive correlation analysis between FST and He of 30 InDel loci in investigator DIPplex kit and provided strong evidence for ongoing InDel loci searching.

Financial support and sponsorship

This study was financially supported by The Fundamental Research Funds for the Central Research Institutes with project number “2017JB004.”

Conflicts of interest

There are no conflicts of interest.

  References Top

LaRue BL, Ge J, King JL, Budowle B. A validation study of the Qiagen investigator DIPplex® kit; an INDEL-based assay for human identification. Int J Legal Med 2012;126:533-40.  Back to cited text no. 1
Pepinski W, Abreu-Glowacka M, Koralewska-Kordel M, Michalak E, Kordel K, Niemcunowicz-Janica A, et al. Population genetics of 30 INDELs in populations of Poland and Taiwan. Mol Biol Rep 2013;40:4333-8.  Back to cited text no. 2
Shen C, Zhu B, Yao T, Li Z, Zhang Y, Yan J, et al. A30-inDel assay for genetic variation and population structure analysis of Chinese Tujia group. Sci Rep 2016;6:36842.  Back to cited text no. 3
Wang L, Lv M, Zaumsegel D, Zhang L, Liu F, Xiang J, et al. Acomparative study of insertion/deletion polymorphisms applied among Southwest, South and Northwest Chinese populations using investigator(®) DIPplex. Forensic Sci Int Genet 2016;21:10-4.  Back to cited text no. 4
Wei YL, Qin CJ, Dong H, Jia J, Li C ×. A validation study of a multiplex INDEL assay for forensic use in four Chinese populations. Forensic Sci Int Genet 2014;9:e22-5.  Back to cited text no. 5
Saiz M, André F, Pisano N, Sandberg N, Bertoni B, Pagano S, et al. Allelic frequencies and statistical data from 30 INDEL loci in Uruguayan population. Forensic Sci Int Genet 2014;9:e27-9.  Back to cited text no. 6
Zhang YD, Shen CM, Jin R, Li YN, Wang B, Ma LX, et al. Forensic evaluation and population genetic study of 30 insertion/deletion polymorphisms in a Chinese Yi group. Electrophoresis 2015;36:1196-201.  Back to cited text no. 7
Martínez-Cortés G, García-Aceves M, Favela-Mendoza AF, Muñoz-Valle JF, Velarde-Felix JS, Rangel-Villalobos H, et al. Forensic parameters of the investigator DIPplex kit (Qiagen) in six Mexican populations. Int J Legal Med 2016;130:683-5.  Back to cited text no. 8
Hefke G, Davison S, D'Amato ME. Forensic performance of investigator DIPplex indels genotyping kit in native, immigrant, and admixed populations in South Africa. Electrophoresis 2015;36:3018-25.  Back to cited text no. 9
Kis Z, Zalán A, Völgyi A, Kozma Z, Domján L, Pamjav H, et al. Genome deletion and insertion polymorphisms (DIPs) in the Hungarian population. Forensic Sci Int Genet 2012;6:e125-6.  Back to cited text no. 10
Friis SL, Børsting C, Rockenbauer E, Poulsen L, Fredslund SF, Tomas C, et al. Typing of 30 insertion/deletions in Danes using the first commercial indel kit – Mentype® DIPplex. Forensic Sci Int Genet 2012;6:e72-4.  Back to cited text no. 11
Seong KM, Park JH, Hyun YS, Kang PW, Choi DH, Han MS, et al. Population genetics of insertion-deletion polymorphisms in South Koreans using investigator DIPplex kit. Forensic Sci Int Genet 2014;8:80-3.  Back to cited text no. 12
Yang CH, Yin CY, Shen CM, Guo YX, Dong Q, Yan JW, et al. Genetic variation and forensic efficiency of autosomal insertion/deletion polymorphisms in Chinese Bai ethnic group: Phylogenetic analysis to other populations. Oncotarget 2017;8:39582-91.  Back to cited text no. 13
Weir BS, Cockerham CC. Estimating F-Statistics for the analysis of population structure. Evolution 1984;38:1358-70.  Back to cited text no. 14
Shi M, Liu Y, Bai R, Jiang L, Lv X, Ma S, et al. Population data of 30 insertion-deletion markers in four Chinese populations. Int J Legal Med 2015;129:53-6.  Back to cited text no. 15
Liu X, Chen F, Niu Y, Bian Y, Zhang S, Zhu R, et al. Population genetics of 30 insertion/deletion polymorphisms in Han Chinese population from Zhejiang province. Forensic Sci Int Genet 2017;28:e33-e35.  Back to cited text no. 16
Li H, Xiaoguang W, Sujuan L, Yinming Z, Xueling O, Yong C, et al. Genetic polymorphisms of 30 indel loci in Guangdong Han population. J Sun Yat Sen Univ (Med Sci) 2013;34:6.  Back to cited text no. 17
Wang Z, Zhang S, Zhao S, Hu Z, Sun K, Li C, et al. Population genetics of 30 insertion-deletion polymorphisms in two Chinese populations using Qiagen investigator® DIPplex kit. Forensic Sci Int Genet 2014;11:e12-4.  Back to cited text no. 18
Guo Y, Shen C, Meng H, Dong Q, Kong T, Yang C, et al. Population differentiations and phylogenetic analysis of tibet and Qinghai Tibetan groups based on 30 inDel loci. DNA Cell Biol 2016;35:787-94.  Back to cited text no. 19
Meng HT, Zhang YD, Shen CM, Yuan GL, Yang CH, Jin R, et al. Genetic polymorphism analyses of 30 InDels in Chinese xibe ethnic group and its population genetic differentiations with other groups. Sci Rep 2015;5:8260.  Back to cited text no. 20
Excoffier L, Laval G, Schneider S. Arlequin (version 3.0): An integrated software package for population genetics data analysis. Evol Bioinform Online 2007;1:47-50.  Back to cited text no. 21
Jombart T, Devillard S, Balloux F. Discriminant analysis of principal components: A new method for the analysis of genetically structured populations. BMC Genet 2010;11:94.  Back to cited text no. 22
Pritchard JK, Stephens M, Donnelly P. Inference of population structure using multilocus genotype data. Genetics 2000;155:945-59.  Back to cited text no. 23
Earl DA, vonHoldt BM. Structure harvester: A website and program for visualizing structure output and implementing the Evanno method. Conserv Genet Resour 2012;4:3.  Back to cited text no. 24
Evanno G, Regnaut S, Goudet J. Detecting the number of clusters of individuals using the software STRUCTURE: A simulation study. Mol Ecol 2005;14:2611-20.  Back to cited text no. 25
Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular evolutionary genetics analysis version 6.0. Mol Biol Evol 2013;30:2725-9.  Back to cited text no. 26
Kane D. The Chinese language : its history and current usage[J]. North Clarendon Vt Tuttle Publishing; 2006.  Back to cited text no. 27
Li L, Wang Y, Yang S, Xia M, Yang Y, Wang J, et al. Genome-wide screening for highly discriminative SNPs for personal identification and their assessment in world populations. Forensic Sci Int Genet 2017;28:118-27.  Back to cited text no. 28


  [Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]

  [Table 1], [Table 2]


Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
Access Statistics
Email Alert *
Add to My List *
* Registration required (free)

  In this article
Materials and Me...
Results and Disc...
Article Figures
Article Tables

 Article Access Statistics
    PDF Downloaded358    
    Comments [Add]    

Recommend this journal