|Year : 2018 | Volume
| Issue : 3 | Page : 142-149
An empirical study of exploring nonphonetic forensic speaker recognition features
School of Foreign Languages, Zhaoqing University, Zhaoqing, 526061, China
|Date of Web Publication||28-Sep-2018|
Dr. Xin Guan
School of Foreign Languages of Zhaoqing University, Zhaoqing, Guangdong 526061
Source of Support: None, Conflict of Interest: None
So far, phonetic features have been the main type of forensic speaker recognition features studied and used in practice. One problem with phonetic forensic speaker recognition features is that they are affected dramatically by the real-world conditions, which results in within-speaker variations and consequently reduces the reliability of forensic speaker cognition results. In this context, supported by Sapir's description of the structure of speech behavior and discourse information theory, natural conversations are adopted as experiment materials to explore nonphonetic features that are supposed to be less affected by real-world conditions. The results of experiments show that first there exist nonphonetic features besides phonetic features, and what's more, the nonphonetic features are less affected by real-world conditions as expected.
Keywords: Forensic speaker recognition, natural conversations, nonphonetic, real-world conditions
|How to cite this article:|
Guan X. An empirical study of exploring nonphonetic forensic speaker recognition features. J Forensic Sci Med 2018;4:142-9
| Introduction|| |
Forensic speaker recognition (FSR) technology is employed in legal practice to decide whether or not the audio recordings involved in criminal activities have been produced by a known suspect. Phonetic features have been being the main type of features used to compare voices either at home or abroad although it is agreed that they are dramatically affected by the real-world conditions, which leads to within-speaker variability and consequently reduces the reliability of FSR results.
To improve the reliability of FSR results, researchers and practitioners suggest that different FSR methods should be combined,,, or researchers and practitioners from different academic backgrounds should cooperate  or that natural conversations should be used as experiment materials when the efficacy of phonetic features are tested in research.
In such contexts, natural conversations have been adopted as experiment materials in this study to explore nonphonetic features that are supposed to be less affected by real-world conditions.
| Theoretical Basis|| |
The existence of nonphonetic features is supported by the five levels of speech behavior described by Sapir. Discourse information theory (DIT) provides theory and tool to explore such nonphonetic features.
Five levels of speech behavior
Sapir  thinks that speech, as human behavior, consists of five levels of behavior, and each level is the product of the interaction between a speaker's social and individual identities.
The five levels are the voice as such, speech dynamics, the pronunciation, the vocabulary, and the style of connected utterance. Voice, the lowest level, refers to the quality of voice. The next level, speech dynamics, involves intonation, rhythm, fluency, and speed, etc. The fourth level, vocabulary, refers to the choice of words. The highest level, the style of connected utterance, is defined as “an individual method of arranging words into groups and of working these up into larger units.” Sapir emphasizes that everyone has his/her individual style in both conversations and considered address and it is never arbitrary and casual. Moreover, Sapir believes that it is theoretically possible to disentangle the social and individual determinants of style.
So far, except for the highest level of speech, speaker-specific features at the other four levels have been observed and examined in FSR research and practice, especially the phonetic features lying at the first three levels of speech. Three approaches to FSR have been applied to Chinese languages, which are aural-spectrographic approach, the aural-acoustic-phonetic approach, and acoustic-phonetic and automatic approaches, and all those approaches qualitatively or quantitatively examine aural phonetic or phonetic acoustic features.
Discourse information theory
DIT grew out of the tree model of DI, which represents the information structure of discourse and can be exploited to analyze written or oral discourse of any type.
According to DIT, the surface layer of discourse is language, the underlying layer is cognition, and information lies in-between. Contrasted with the flexible language forms at the surface layer, DI is relatively stable and does not correspond to language forms one by one. DI relatively represents people's cognitive structure, and language overtly represents DI. Therefore, it is easy to understand the cognition at the underlying layer of discourse and the language at the surface layer of discourse regarding DI.
Furthermore, the kernel idea of “reality-cognition-language” from Embodied philosophy and cognitive linguistics implies that speakers' linguistic creativity is the output after real-world information is processed by speakers' cognitive mechanism. In other words, the individuality in speech represents a speaker's personalized cognition of the world. Hence, it is more likely to explore individual features with speaker-discriminating power at the highest level of speech behavior by analyzing DI instead of language forms, which represent a speaker's individual style of his/her own speech.
According to DIT, the information structure of discourse can be analyzed at two levels, macro and micro. Macro information structure focuses on the relationship between information units, and micro information structure focuses on the information elements and the relationship between them within an information unit. An information unit is a proposition, which is the minimal and complete communicative meaning unit with a relatively independent structure. As each proposition is a process that centers on predicates and proceeds in certain conditions, and the objects involved in the process are entities, process, entity, and condition are defined as the three major types of information elements which indicate the concrete content and property of information elements and have their own subtypes. At the surface level of language, information units are in the form of clauses, and information elements are in the form of words or phrases composing a clause. In short, information elements constitute an information unit, and information units constitute discourse, thus to explore the individual pattern or characteristics of a speaker's DI structure is to find out his/her “individual method of arranging words into groups and of working these up into larger units.” That is to say, it is theoretically and practically possible to apply DI analysis approach to the exploration of nonphonetic FSR features at the highest level of speech. Hence, the exact aim of this study is to explore nonphonetic FSR features that represent a speaker's individual speaking style by adopting DI analysis approach to analyze natural conversations.
| Materials and Methods|| |
So as to explore nonphonetic features that are less affected by real-world conditions, natural conversations are used as experiment materials, and four experiments are designed. The efficacy of the explored nonphonetic features is tested in the likelihood ratio (LR) framework.
The database of this study is composed of total 233 conversations from 81 students from Guangdong University of Foreign Studies, who all speak standard Chinese and 4 of whom are males. The 58 speakers aged between 21 and 25 are postgraduates, who contribute 170 conversations; the 17 speakers aged between 19 and 21 are undergraduates, who contribute 34 conversations; and the 6 speakers aged between 27 and 39 are PhD candidates, who contribute 29 conversations.
All the 233 conversations are sampled from the corpus of Chinese Natural Conversations, which is one subcorpus of Corpus for the Legal Information Processing System. All the participants of the conversations in the subcorpus were volunteers and signed consent form for this study. This study was approved by the Internal Review Board of Guangdong University.
The data of this study are conversations on telephones and face-to-face conversations. The conversations on telephones have been automatically recorded with mobile phone's built-in recording software which has been set up in advance. The face-to-face conversations have been recorded with digital voice recorders and other recording tools, like Mp3 players, etc., by the contributor's friends who are not the addressees. The contributors have affirmed that while the contributed conversations were being recorded they and their addressees were in the condition of unawareness. Hence, all conversations in the database of this study are natural conversations that are likely to be unplanned and occurred and have been recorded in the real-world conditions.
Likelihood ratio framework
Likelihood ratio (LR) framework reports the probability of the evidence under two rival assumptions, the prosecution hypothesis and the defense hypothesis. Specifically, not only the magnitude of differences between voice samples are reported, the typicality of the differences in a relevant population is but also reported.
The strength of the evidence in the LR framework is expressed by the ratio between the prosecution hypothesis (Hso) that the questioned and the known voice samples come from the same speaker and the defense hypothesis (Hdo) that the questioned and the known voice samples come from different speakers. The calculated ratio is an LR value as expressed in the following equation, which denotes the strength of forensic evidence.
Where LR is the likelihood ratio, “P” is the probability, “|” is equal to conditional upon, and “E” is the evidence, i.e., the measured differences between the questioned and the known voice samples in numerator, between the questioned and the relevant population voice samples in denominator. The relevant population refers to the population of which the questioned speaker belongs. The numerator denotes the degree of similarity between the known and questioned voice samples and the denominator denotes the degree of typicality of the questioned speaker with respect to the relevant population.
Within the LR framework, the reliability of FSR features or an FSR system can be assessed by the so-called a cross-validation procedure. In a cross-validation procedure, the reliability is tested on a large number of pairs of voice samples where it is known whether each pair has the same origin or a different origins.
Since the LR values for same-speaker pairs should be greater than one, but for different-speaker pairs less than one, the known same-speaker and different-speaker pairs can be tested to see to what degree they are correctly discriminated. The results of the LRs calculated from the same-speaker and different-speaker comparisons can be demonstrated in a Tippett plot, which is also called a reliability plot. The Tippett plot can be used to evaluate the general performance of FSR features or an FSR system. Besides, the validity of FSR features or an FSR system can be assessed by calculating log-likelihood-ratio costs (Cllr) with LRs calculated from the same-speaker and different-speaker comparisons.
Discourse information features to be observed
The types of relationship between a subordinate information unit and its superordinate information unit are termed as information knots and are represented by 15 interrogative words, which are WT (what thing), WB (what basis), WF (what fact), WI (what inference), WP (what disposal), WO (who), WN (when), WR (where), HW (how), WY (why), WE (what effect), WC (what condition), WA (what attitude), WG (what change), and WJ (what judgment). It is a fact that the distribution of the 15 types of DI knots in discourse is closely related to the length of discourse, and as a result, some types of information knots will be absent in short discourse. That is to say, the occurrence frequency of different types of DI knots is not equally distributed, and even if discourse is long enough, some types of DI knots will not present yet such as WP, and WG (ibid.). Therefore, to guarantee the availability of the potential FSR features in relevant materials, which is one criterion for FSR features, it has to be settled first of all what types of DI knots should be observed.
To decide the types of DI knots to be observed, the types of DI knots present in each conversation are recorded and the percentage of the conversations containing each type of DI knot in the database is calculated, and then, the number of each type of DI knot that is present in every conversation is counted and summed to calculate the occurrence percentage of each type of DI knot in all conversations as a whole. The results show that among the 15 types of DI knots, the knot of WT is present in all the 233 conversations across medium, speech situations, and time no matter how long the conversation is. In addition, the share of the DI knot of WT is much larger than those of the other types of DI knots, which accounts for 51% of all the 2887 knots from the 233 sampled conversations in the database. Hence, WT has been decided to be the observed DI knot. Regarding DI elements, three major types of DI elements, entity, process, and condition are present in all the 233 conversations. Hence, at the micro level, the three main types of information elements are examined.
Both Johnstone  and Biber  have found out absolute frequencies reflects a speaker's individuality, which is the consistent co-occurrence pattern of frequency counts of particular linguistic features. Aitken, Roberts, and Jackson  describe absolute frequencies as “counts of observed events, characteristics or other phenomena of interest to any inquiry.”
Relative to absolute frequencies, relative frequencies are more useful as far as statistical evidence is concerned, which are the frequencies “relative to a repeated number of observations.” Hence, in the first place, the frequencies, both absolute and relative, of the DI knot of WT and DI elements are examined to explore all DI features that possibly show high between-speaker variability and low within-speaker variability, which is the most important criterion that FSR features have to meet.
Besides, Hollien  commentates that a discourse type-token ratio (TTR) cannot be “easily, or consciously manipulated,” which may reflect the discourse producer's individuality. A discourse TTR is the ratio between the number of different words in a passage and the total number of words in that same passage. Likewise, lexical density is another type of TTR that is considered to be a useful measure of the amount of information in a particular text, which calculates the percentage of the number of lexical word types in the total number of word tokens in the analyzed discourse., Analogously, a DI TTR concerns either the TTR of DI knots or the TTR of DI elements, which is expected to reflect a speaker's individual habits of constructing DI. Hence, in the second place, DI TTRs are examined to explore all DI features that possibly show high between-speaker variability and low within-speaker variability.
Since the most important criterion for any FSR feature is to show high between-speaker variability and low within-speaker variability, four serial experiments are designed to achieve the research objective of this study. The first experiment aims to select features showing high between-speaker variability from all extracted features. The second experiment aims to select features showing low within-speaker variability from the features selected in Experiment 1. The third experiment tests the efficacy of the features selected in Experiment 2. The fourth experiment further tests the reliability of selected nonphonetic features through testing the efficacy of an FSR system based on the selected features.
| Experiments|| |
This part describes the procedure and data for each experiment.
A common way of selecting features showing high between-speaker variability has been to inspect the ratio of between-speaker to within-speaker variation.,,, The ratio is called the F-ratio, which is usually a by-product of the analysis of variance (ANOVA). Therefore, ANOVA is used in Experiment I first to train the extracted features and then test the trained features.
Three datasets of speakers have been selected from the database as the data for Experiment 1.
The first dataset works as the training data, which are used to extract and train features that may show high between-speaker variability and low within-speaker variability. The training dataset is composed of five undergraduate female speakers. Every speaker contributes one conversation, and there are altogether five conversations in the training dataset.
The next two datasets work as the testing data, which are used to test the DI features that have been trained with the training data. One testing dataset is composed of 22 postgraduate female speakers. Another testing dataset is composed of four male speakers. Every female speaker in the first testing dataset contributes one conversation, and thus, the testing dataset for the female speakers consists of 22 conversations. Every male speaker in the second testing dataset contributes two conversations due to the small dataset, and thus, the testing dataset for the male speakers consists of eight conversations.
One-way ANOVA procedures in this experiment are conducted in SPSS 21 after all assumptions for the statistic procedure are checked to meet, and all data are cleaned.
To select the features with low within-speaker variability, one-way ANOVA is used. If the F-ratio of ANOVA of the same speaker's conversations regarding these features is insignificantly low, it supports that there is no significant difference among a speaker's conversations across speech situations and time regarding the tested features, which indicates low within-speaker variability.
To test the extent to which the selected features in Experiment 1, 149 conversations from 24 speakers, both male and female, are selected from the database. Each speaker's sampled conversations compose a dataset, which contains at least four conversations and at most 11 conversations, and there are 24 datasets in total. The 149 conversations last from 10 s to 2 min and 50 s and the time interval of each speaker's sampled conversations is over 1 week. What's more, there is no overlap of the topic of conversation and the addressee between any two conversations produced by the same speaker.
In this study, ANOVA procedures are conducted in SPSS 21. First, the significant value of ANOVA test is reported and compared to the determined a priori alpha level. If the reported P value is larger than the level of significance (P > α), the null hypothesis that all conversations in the dataset are produced by the same speaker is retained. If the reported P value is less than or equal to the level of significance (P ≤ α), the null hypothesis that all conversations in the dataset are produced by the same speaker tends to be rejected. When the null hypothesis is rejected, the post hoc comparisons need to be conducted to find out whether there are significant differences between any two of the conversations in a dataset. If the post hoc test shows the significant differences between any two of conversations in the dataset, the null hypothesis that all conversations in the dataset are produced by the same speaker is finally and completely rejected. If the post hoc test shows no significant differences between any two of the conversations in the dataset, the significant P value reported by ANOVA may be due to chance, and the null hypothesis still stands that all conversations in the dataset are produced by the same speaker.
Experiment 3 aims to assess the general performance and validity of the features that are selected from Experiments 1 and 2 and show high between-speaker variability and low within-speaker variability.
One test dataset and a background dataset are used to design a cross-validation procedure. The test dataset is composed of 24 speakers and each speaker contributes two conversations; the background dataset is composed of 30 speakers and each speaker contributes one conversation. Totally 78 conversations from 54 sociolinguistically homogeneous speakers have been sampled from the database to assess the general performance and validity of the selected features.
In a cross-validation procedure, first, the 24 speakers in the test dataset are paired according to the rule that each speaker's first conversation is compared with his/her own second conversation and with every other speaker's second conversation. The first conversation from each speaker in the test dataset is used to create the suspect model, and their second conversation is used as offender data. Each cross-validation procedure results in a total of 24 same-speaker comparison pairs and 552 different-speaker comparison pairs for each feature to be assessed.
Second, Aitken and Lucy's  MVKD procedure is applied to the test dataset to attain LRs, which is implemented with the Matlab programme software by Morrison  in Matlab2012a. This procedure is conducted for each feature.
Third, to evaluate the general performance of each feature, Tippett plot is drawn for each of them on the basis of the LRs generated from the designed cross-validation procedure. At the same time, Cllr is calculated to assess the validity of every feature on the basis of the LRs generated for all cross-validated same-speaker and different-speaker comparisons.
In Experiment 4, the more promising features tested in Experiment 3 are picked out and used to develop FSR systems.
In an FSR system, the features that are supposed to work together to discriminate speakers instead of each playing a role individually “should be maximally independent” of each other, which is one of the six criteria that FSR features should meet. In addition, the mutual independence among the FSR features in an FSR system makes the combination of the LRs of different features much easier (ibid.: 57). The combined LR is the product of their associated LRs (ibid.). Put simply, the overall LR of an FSR system can be attained by multiplying the LRs of all member parameters that are independent of each other. Therefore, the correlation among the selected more promising DI features is tested in SPSS 21 to ensure the mutual independence among the member features used to build FSR systems. And then, the general performance and validity of FSR systems are tested.
| Results of Experiments and Discussion|| |
In Experiment 1, altogether 26 DI features of absolute and relative frequencies and TTRs are extracted, trained, and then tested twice. The F-ratios for 18 features are larger than one, which includes 14 relative frequency features reflecting a speaker's habits to accommodate the certain types of information elements into the WT information unit or into a conversation and four TTR features reflecting a speaker's habits to organize DI elements into a DI unit. That is to say, 18 DI features have been extracted and tested to show high-between speaker reliability.
The within-speaker consistency of the 18 features selected in Experiment 1 is tested with one-way ANOVA in SPSS 21, and the 12 of them have been tested to stay consistent in all 24 datasets. It means that the 12 features have been tested to show high between-speaker variability and low with-speaker variability and may be effective FSR features. [Table 1] lists the calculation formulas of the 12 features.
|Table 1: Calculation formulas of the discourse information features selected from Experiments 1 and 2|
Click here to view
Based on the LRs generated in Experiment 3, the Tippett plots for all 12 selected DI features are drawn in Matlab2012a, see [Figure 1]. The solid lines rising to the right in these figures record the 24 log-ten-LR values generated by the same-speaker comparisons for each feature; and the dotted lines rising to the left in these figures record the 552 log-ten-LR values generated by the different-speaker comparisons for each feature. The overall shape of the solid and dotted lines reflects the general performance of a feature as FSR feature. The intersection point of the two types of lines denotes the EER value. Usually, the further apart the two lines above the intersection point are, and at the same time, the closer together they are below the intersection point, the better a feature performs to discriminate speakers. In addition, the lower the EER value is, the more valid the system is.
In general, the Tippett plots for the 12 features demonstrate that P12, P4, P3, P6, and P9 work better to support the consistent-with-fact hypotheses that the questioned and the known samples are of the same origin or of the different origins as they indeed are. While P2, P5, P10, P8, P1, and P11 works better to support the consistent-with-fact hypotheses with minimal support for contrary-to-fact hypotheses. Based on the overall shape of Tippett plot, P4, P8, P10, and P12 perform better than others, in whose Tippett plots, the solid and dotted lines are further apart above the intersection point and are closer together below the intersection point than those in Tippett plots of other features. Among the four better-performing features, P12 may be the best-performing one, and P10 follows as the second best-performing one. P3 and P1 perform worst, in whose Tippett plots the solid and dotted lines below the intersection point are further apart than those in the Tippett plots for other features.
As far as EER is concerned, P8, P10, and P12 have an EER <35% while P1, P2, P3, and P6 have an EER larger than 40%. P4, P5, P7, P9, and P11 have an EER between 35% and 40%.
In short, Tippett plots for all 12 features illustrate that P1 and P3 perform worse than others, and P8, P10, and P12 works better than others.
Cllr for each potential DI feature is given in [Table 2].
|Table 2: Log-likelihood-ratio costs for twelve discourse information features|
Click here to view
[Table 2] shows that the Cllr for P12 is the lowest, and the Cllr for P3 is the highest that is bigger than one. The Cllr for P1 is a litter lower than one and higher than those for the other 10 features, and the Cllrs for P8, P10, P11, and P4 are a little higher than that for P12 but lower than 0.9. It means that in terms of validity, among the 12 features, P12 is the most valid, and P1 and P3 are the least valid. Since usually a qualified feature produces a Cllr lower than one, P3 is filtered out. P1 is only a little lower than one and thus is also filtered out. That is to say, 10 features have been tested to be more promising and can be used to develop FSR systems.
[Table 3] records the Pearson correlation coefficients between the 10 more promising features tested in Experiment 3. The correlation that is significant at the significant level of 0.01 is marked with one asterisk (*) at the top of the correlation coefficient. The table shows that P2 is significantly correlated with P9, and P8 is significantly correlated with P5 and P12.
|Table 3: Correlation among the ten promising discourse information features|
Click here to view
Next, according to the result of the correlation analysis, candidate FSR systems are developed based on mutually independent features that center on the most promising ones, P4, P8, P10, P11, and P12.
Since P8 is significantly correlated with P12, and they cannot present in the same system, the most promising parameters are put into two groups. The first group includes P4, P10, P11, P12 and the second group includes P4, P8, P10, and P11. Centering on these two groups, respectively, 11 candidate FSR systems have been built as shown in [Table 4].
|Table 4: Member features of candidate forensic speaker recognition systems|
Click here to view
The serial number of each candidate system is given in the first column of each part in [Table 4]. In the column of member parameters, the features composing each candidate system are identified. The central member features in the first six candidate systems are P4, P10, P11, and P12; the central member features in the last five candidate systems are P4, P8, P10, and P12. P2 and P7 are put into the two groups of the most promising potential DI features one by one, and then, the other features are put into the new combinations. Due to the lower validity of P9, P5, and P6, they are not put into the combinations separately.
The Tippett plots for all 11 candidate FSR systems are given in [Figure 2]. In general, these Tippett plots show that all these 11 candidate systems perform better than any of the 10 promising features do individually, which display further apart lines above the intersection point and closer lines below the intersection point than the Tippett plots for the individual features do. All the 11 candidate systems have the EER below 28%, which is much better than the lowest EER of about 31% for the best-performing feature, P12.
|Figure 2: Tippett plots for the 11 candidate forensic speaker recognition systems|
Click here to view
The Cllrs for all the 11 candidate systems are given in [Table 5], which are all lower than one. In addition, candidate systems No. 3 and No. 4 are more valid than others among the first six candidate systems and the most valid feature of P12.
|Table 5: Log-likelihood-ratio costs for candidate forensic speaker recognition systems|
Click here to view
The results of Experiment 4 illustrate that not only nonphonetic features exist at the highest level of speech but also there exist mutual-independent nonphonetic features that can be used to build FSR systems performing better than single nonphonetic feature does.
| Conclusion|| |
Based on Sapir's description of five levels of speech behavior, supported by DIT, four serial experiments are designed to analyze natural conversations to explore nonphonetic FSR features that are supposed to be less affected by real-world conditions.
The results of experiments provide proof that there are nonphonetic features representing the individual style of a speaker's own speech and such nonphonetic features can be theoretically and practically extracted and measured.
Furthermore, due to the fact that the experiment materials in this study are real, natural conversations, the solid evidence is presented that the telephone transmission channel or the real-world conditions in which a conversation is recorded does not reduce the number of available nonphonetic DI features, and distort them as they do to phonetic features. That is to say, nonphonetic DI features tend to be less affected by real-world conditions.
To sum up, the conclusion drawn from this study exemplifies that the nonphonetic features, along with the FSR systems based on nonphonetic features, are of good efficacy and tend to be as valid and reliable they have been tested to be with natural conversations in the experiments. It is hoped that the conclusion not only rewards the efforts made in this study but also encourages further research of the exploration of nonphonetic FSR features and attempt to use nonphonetic FSR features in practice.
This paper is one of the outcomes of the “13th Five-Year Plan” Philosophy and Social Science Research Program (GD16CWW02), the Study of Identification of We-Media Language in Big Data Era, which is directed by Guan Xin and has been approved by Guangdong Planning Office of Philosophy and Social Science in 2016.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Bijhold J, Ruifrok A, Jessen M, Geradts Z, Ehrhardt S, Alberink I. Forensic Audio and Visual Evidence 2004-2007: A Review. 15th
INTERPOL Forensic Science Symposium. Lyon, France: IIFSS; 2007.
Cambier-Langeveld T. Current methods in forensic speaker identification: Results of a collaborative exercise. Int J Speech Lang Law 2007;14:223-43.
Morrison GS. Making Demonstrably Valid and Reliable Forensic Voice Comparison a Practical Reality in Australia 2010. ARC Linkage Project 2010-2013. Available from: http://www.geoff-morrison.net/
Gold E, French P. An International Investigation of Forensic Speaker Comparison Practices. Proceedings of the 17th
International Congress of Phonetic Sciences. Hong Kong: ICPHS; 2011. p. 751-4.
Sapir E. Speech as a personality trait. Am J Sociol 1927;32:892-905.
Zhang C, Morrison GS. Forensic voice comparison. In: Sybesma R, editor. Chinese Language and Linguistics. Vol. 2 De-Med. Leiden, Boston: Brill; 2017.
Du J. A study of the tree information structure of legal discourse. Mod Foreign Lang 2007;30:40-50.
Du J. Discourse information analysis: A new research perspective in forensic linguistics. Chinese Social Sciences Weekly 2011;5.24:015.
Wang Y. The Interpretation for Cognitive Process of Language Formation. J Sichuan Int Stud Univ 2006;22:53-9.
Morrison GS. The Place of Forensic Voice Comparison in the Ongoing Paradigm Shift. The 2nd
International Conference on Evidence Law and Forensic Science Conference Thesis. The Key Laboratory of Evidence Science of the Ministry of Education (The Institute of Evidence Law and Forensic Science, China University of Political Science and Law). Vol. 1. Beijing, China; 2009. p. 20-34.
Morrison GS, Enzinger E, Zhang C. Forensic speech science. In: Freckelton I, Selby H, editors. Expert Evidence. Ch. 99. Sydney, Australia: Thomson Reuters; 2018.
Du J. On Legal Discourse Information. Beijing: People's Publishing House; 2014.
Johnstone B. The Linguistic Individual: Self-Expression in Language and Linguistics. New York and Oxford: Oxford University Press; 1996.
Biber D. Variation Across Speech and Writing. Cambridge: Cambridge University Press; 1998.
Aitken C, Roberts P, Jackson G. Fundamentals of Probability and Statistical Evidence in Criminal Proceedings: Guidance for Judges, Lawyers, Forensic Scientists and Expert Witnesses. London: Royal Statistical Society; 2010.
Rose P. Forensic Speaker Identification. London & New York: Taylor & Francis; 2002.
Hollien H. The Acoustics of Crime: The New Science of Forensic Phonetics. New York: Plenum Press; 1990.
Nolan F. The Phonetic Bases of Speaker Recognition. Cambridge: Cambridge University Press; 1983.
Pruzansky S, Mathews MV. Talker-recognition procedure based on analysis of variance. JASA 1964;36:2041-7.
Wolf JJ. Efficient acoustic parameters for speaker recognition. JASA 1972;51:2044-56.
Aitken C, Lucy D. Evaluation of trace evidence in the form of multivariate data. Journal of The Royal Statistical Society Series C-Applied Statistics 2004;53:109-22.
Morrison GS. Matlab Implementation of Aitken & Lucy's 2004 Forensic Likelihood-Ratio Software Using Multivariate-Kernel-Density Estimation; 2007. Available from: http://www.geoff-morrison.net
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]