Year : 2017 | Volume
: 3 | Issue : 3 | Page : 139--143
Analysis of errors in forensic science
Institute of Evidence Law and Forensic Science, China University of Political Science and Law, Beijing 100088, China
Institute of Evidence Law and Forensic Science, China University of Political Science and Law, Beijing 100088
Reliability of expert testimony is one of the foundations of judicial justice. Both expert bias and scientific errors affect the reliability of expert opinion, which in turn affects the trustworthiness of the findings of fact in legal proceedings. Expert bias can be eliminated by replacing experts; however, it may be more difficult to eliminate scientific errors. From the perspective of statistics, errors in operation of forensic science include systematic errors, random errors, and gross errors. In general, process repetition and abiding by the standard ISO/IEC:17025: 2005, general requirements for the competence of testing and calibration laboratories, during operation are common measures used to reduce errors that originate from experts and equipment, respectively. For example, to reduce gross errors, the laboratory can ensure that a test is repeated several times by different experts. In applying for forensic principles and methods, the Federal Rules of Evidence 702 mandate that judges consider factors such as peer review, to ensure the reliability of the expert testimony. As the scientific principles and methods may not undergo professional review by specialists in a certain field, peer review serves as an exclusive standard. This study also examines two types of statistical errors. As false-positive errors involve a higher possibility of an unfair decision-making, they should receive more attention than false-negative errors.
|How to cite this article:|
Du M. Analysis of errors in forensic science.J Forensic Sci Med 2017;3:139-143
|How to cite this URL:|
Du M. Analysis of errors in forensic science. J Forensic Sci Med [serial online] 2017 [cited 2019 Sep 15 ];3:139-143
Available from: http://www.jfsmonline.com/text.asp?2017/3/3/139/215815
Rules of admission of expert testimony focus on excluding scientific principle based on junk science or invalid science. Accordingly, scientific principles or methods accepted by most communities are typically admissible. The Federal Rule of Evidence 702 qualified judges as gatekeepers in deciding whether a science principle could be admissible as a principle for forensic evidence. This change has facilitated the acceptance of more science principles and methods, especially new theories and methods, helping triers of fact to reach the truth. According to Daubert v. Merrell Dow Pharmaceuticals, Inc., judges should consider the following requirements before accepting a theory or technique: whether it is testable (and has been tested), whether it has been peer-reviewed and published, the known or potential rate of error, whether the standards of technical operations and the theory or technique are accepted by the relevant scientific community.
These rules pay less attention to other factors and may lead to unreliable forensic testimony due to expert bias and errors of instruments. Systematic errors, such as invalid science, stemming from operating such a forensic analytical system are inevitable. According to statistics, besides systematic errors, environmental errors and gross errors also lead to unjust sentences. Furthermore, methods, instruments, and techniques are means to assist experts' reasoning and decision-making. Alternatively, from the perspective of the system, both systematic and random errors are inevitable. The problems of which types of errors should be recognized and how to control them arise.
Judges and jurors need experts' help to reach the truth mainly in two situations. Firstly, the problem affects the trier of fact's understanding of the evidence or determination of a fact in the case. Secondly, the function of the expert is to help the trier of fact recognize evidence or facts, thereby ensuring that proceedings go smoothly and a fair judgment. Forensic testimony may unfold after two situations. Judges and jurors need experts' help to reach the truth. In the first scenario, the problem affects the trier of fact's understanding of the evidence or determination of a fact in the case. On the other hand, in the second scenario, the function of the expert is to help the trier of fact recognize evidence or facts, thereby ensuring that proceedings go smoothly and a fair judgment. In a lawsuit, a major duty of experts is to provide their opinions based on the materials they have obtained. As experts may not obtain materials from both parties, they may not be as impartial as judges. Both errors and bias of experts paralyze their helpful function. As a result, experts should avoid errors and their bias to prevent the trier of fact from making a wrong decision. Although there may be errors in expert opinions, their testimonies tend to be accepted by judges and jurors, thereby helping the trier of fact to understand the evidence or to determine a fact in issue. As judges and jurors lack special knowledge, experience, or skill in terms of expert's field, the reliability of expert testimony is essential to ensure right decisions in court and to fulfill judicial justice.
Considering a single case, the expert is both the designer and the operator of a forensic scientific analysis system. However, considering the entire forensic inspection as a system, the system consists of samples, equipment, test methods, and operators. For example, a higher or lower readout may be provided every time by the equipment. Similarly, because of the limitation of human vision or operational habits, scientific errors also originate from experts. Although expert errors and expert bias may equally lead to unreliable conclusions, they are different. Bias refers to a (expert) witness' partiality toward one party or against another due to reasons related to finance, emotion, etc. However, errors unconsciously made by experts are inevitable. The difference between expert bias and errors is whether there is an emotional factor. According to Wigmore, three different kinds of emotion constituting untrustworthy partiality may be broadly distinguished – bias, interest, and corruption: bias, in common acceptance, covers all varieties of hostility or prejudice against the opponent personally or of favor to the proponent personally. Bias is a kind of partial emotion, which affects expert behavior in the process of making an opinion. Expert bias tends to rise unreliable testimony, increasing the risk of unjust judgment. On the contrary, expert errors do not result from emotional factors. For example, there are inevitable parallax errors caused by limitation of vision of human eyes in laboratory operations and evitable human mistakes such as recording wrong readouts. As the former belongs to systematic errors, while the latter is a sort of gross errors, expert errors are quite common in forensic laboratory operations. Consequently, bias can be controlled by replacing an expert, whereas errors made by experts cannot be controlled through this measure. As a result, this study focuses on such statistical errors that reduce the reliability of expert testimony.
Overview of Errors
Errors refer to the degree that measurement results deviate from the true value. Metrology teaches that there is an element of uncertainty in every measurement; we can never be certain that the measurement captures the true value. Any measure of a physical quantity cannot come to an absolutely accurate value; even with the most perfect, applicable measurement techniques that can be achieved; the measured values and true values differ. What applies to physics and chemistry applies to forensic science: “A key task… for the analyst applying a scientific method is to conduct a particular analysis to identify as many sources of error as possible, to control or eliminate as many as possible, and to estimate the magnitude of remaining errors so that the conclusions drawn from the study are valid.” In other words, errors should, to the extent possible, be identified and quantified. The difference in the measured value and the true value is called the error, and these errors can be divided into systematic errors, random errors, and gross errors.
Systematic errors include instrumental errors, method errors, individual errors, and environmental errors. For example, if the temperature in the room where a scientific instrument is located is not properly set for the instrument, the instrument may consistently produce readings that underestimate or overestimate the true value. Theoretically, system errors can be controlled by measures such as instrumental calibration. The failure of a technician to calibrate an instrument for measuring breath alcohol concentration as frequently as regulations prescribe, for example, is a procedural error. It increases the risk of an error in the actual measurement. The character of systematic errors presents a unidirectional tendency. In other words, readouts or results coming from the same system tend to be consistently higher or lower than the true value. As a result, systematic errors are easily found through cross-examination as the opposite party usually utilizes a different system in all Analysis, Comparison, Evaluation, and Verification processes. Any analysis process using another equipment or following different principles or methods can examine systematic errors.
Another measure to control systemic errors is to abide by Accreditation Criteria for the Competence of Testing and Calibration Laboratories (ISO/IEC 17025: 2005), which standardizes results from different laboratories to minimize errors from operation processes such as equipment calibrations. This is a more effective way to limiting systematic errors, since whatever applies to physics and chemistry applies to all of forensic science. As some experts may not appear before the judge, following Accreditation Criteria for the Competence of Testing and Calibration Laboratories is a more effective and general measure than cross-examination.
Cross-examination could help judges and jurors to find systematic errors in an analysis of experience, such as asking expert or operator the reason of selecting an instrument or equipment, the process of instrumental calibration, and the proportion of admissibility of expert testimony based on same instrument or equipment and the test method. Another main measure finding systematic errors by cross-examination is asking expert or operator about details of how a used instrument in a case passed Accreditation Criteria for the Competence of Testing and Calibration Laboratories, such as the rate of errors of the instrument and period of recalibration. With these details, judges and jurors could make a decision whether the systematic errors of the mentioned instrument or equipment reach an acceptable standard, thereby ensuing expert testimony based on such machine is credibility.
Random errors, caused by uncontrollable reasons, lead to a distribution of results around the true value under Gaussian distributions, which tend to be bidirectional and unpredictable. Random errors can be gauged by taking multiple measurements. Tendencies of errors, distributions of systematic errors, and random errors are different. Systematic errors lead to all errors being in one erroneous tendency while random errors distribute them on two opposites irregularly. According to the limitation of human eyes, for example, indications are not precise truth but estimates. If an operator repeats his/her work following the operation instruction and standard method of getting readouts, errors stemming from the process at that stage could be minimized to an acceptable range. Specifically, when two examiners do their jobs independently, their results contrast, thereby eliminating unreliable random errors.
Gross errors are mainly due to the negligence of measurements caused by undue human errors, for example, reading or recording an erroneous number in an operation. Consequently, discarding the erroneous data or repeating operations can prevent such errors. In similarity with expert bias, gross errors are mainly caused by operators. However, the difference is that gross errors are caused by the negligence of experts while bias stems from financial reasons, emotional affections, or their attitudes. Therefore, repeat operations can also find and reduce unreliable expert testimony caused by gross errors.
Agreement on the identified problem by two experts is the cornerstone of the quantification problem, which is referred to as concordance in judgments of two experts. However, an agreement by two examiners does not prove that they are correct, and their disagreement leaves the problem of which is correct. Consequently, statistics on verifications would not estimate false-positive or false-negative probabilities in practical areas.
Practical Error Evaluation Mechanism
There are two categories of errors: practical and theoretical errors. Practical errors are caused in the rendering of forensic testimony while theoretical errors are errors caused by invalidated science principles and methods and errors in applying these principles and methods.
Two terminologies of relevant errors: Confidence intervals and confidence levels
What is the difference between gross errors and individual systematic errors, such as parallax error, considering that both of them are caused by human factors? Gross errors result from negligence or unintended faults for which someone should take responsibility. However, individual systematic errors are inevitable, no matter how careful the operator is. In terms of forensic science, gross errors should be avoided in decision-making as they affect both reaching the truth and judicial credibility. On the other hand, forensic science and expert testimony could tolerate systematic errors and random errors to a certain level. Therefore, the issue is how to keep them in the degree of toleration.
In a bid to reduce system errors, the holding of Daubert v. Merrell Dow Pharmaceuticals, Inc., suggested peer reviews, publications, and error rates as factors that influence the credibility of expert testimony. Statistically, however, errors need to be estimated using confidence interval and confidence level. A confidence interval is an interval, based on a sample statistic, which contains a parameter value with a specified probability. Confidence interval refers to the estimated range of sample statistics posed by the population parameter. The ratio of the number of times the share of the confidence interval contains the true value of the population parameter is the confidence level.
According to the definition of confidence intervals and confidence levels, in scientific research, increasing the sample size can improve the confidence interval and confidence level, thereby reducing the error rate. The larger the sample size, the more the possibility of reflecting the true value in the range, thereby excluding individual values. However, forensic science by objective conditions, constraints, and samples cannot grow indefinitely; they are limited only to the number of repetition.
Under the conditions that sample size cannot be increased, the confidence level of error should be reduced accordingly. Most scientists routinely require that this confidence level of error rate be very small, usually between one and five percent, while errors in the laboratory (often called “experimental differences,” which sounds better than “errors”) are preferred to be within three percent (either side) of the real number.
There is a strong relationship between the hypothesis test and the error rate; therefore, it is very rare for a scientist to reject a hypothesis without stating the level of confidence at which it was rejected. A hypothesis test is a statistic term and an ideal situation in theory. Before the general requirement for employing statistical test procedures evolves out of practice or pronouncement, the nature of hypothesis testing and its limitations and possible disadvantages in forensic applications should be clearly understood.
In conclusion, the rate of errors depends on confidence level, which means the precision and accuracy of a hypothesis test.
Two types of errors: False-positive errors and false-negative errors
According to the source, errors can be divided into three categories, including systematic errors, random errors, and gross errors. From another perspective, depending on how to conceal the true value, we can divide errors into two types: false-positive errors and false-negative errors. As an index of confidence interval and measure of confidence level, false positives have more practical significance to control.
In hypothesis testing, a null hypothesis is presented for testing. At the same time, there is an alternative hypothesis mutually exclusive to the null hypothesis. If the null hypothesis cannot be supported proved, its alternative hypothesis is established. In this process, however, there may be false negatives and false positives. False negatives refer to when the null hypothesis' true proposition is wrongly rejected while false positives refer to when the null hypothesis' false proposition has not been rejected, wrongly so. Statistical theory suggests that, for a given sample size, these two types of error probabilities cannot be reduced at the same time; if false negatives are reduced, false positives will increase the probability of error and vice versa. To reduce the probability of these two types of errors occurring simultaneously, the only measure is to increase the specimen adequacy. In forensic science, however, this measure is not practical as the specimen obtainable from a case is limited. Therefore, we can only control the errors that rely on standards, at the same time minimizing another type.
In laboratory research, 95% is a generally acceptable confidence coefficient. Some studies strictly require a confidence coefficient of 99%. Regarding samples collected in the cases, and for the purpose of helping the trier to establish the truth, a confidence level of 95% is adequate for ensuring the reliability of conclusions drawn from an operation. Further, a 95% confidence coefficient pertains only to statistical procedures that generate interval estimates.
False-positive errors should be controlled preferentially
What type of errors, false negatives or false positives, should be control of priority. Statistically, priority is typically placed on negative errors. In terms of forensic science, the hypothesis testing has to be considered according to the actual situation.
Each standard of proof (whether beyond a reasonable doubt, in criminal procedure, or preponderance of the evidence, in civil procedure) tends to require evidence validating the truth. Consequently, under this legal principle, a statement of null hypothesis would be whether this evidence could lead to establishing the truth. Consequently, false-negative errors exclude the truth while false-positive errors adopt false truth, thereby deviating lawsuits from the truth. False-positive errors, rather than negative errors, should be the priority control in forensic science. In criminal proceedings, for example, adoption of an expert testimony with positive errors is more likely to lead to innocent people being judged as guilty. On the other hand, negative errors in expert testimony result in the guilty being acquitted or their penalties being reduced. Considering these two faults of judicial judgments, the latter misplaces human life and freedom, damaging the credibility of the justice system and the community, less than the former. Similarly, the United States criminal justice system prefers letting offenders off to wrongful conviction. This choice reflects the American legal system regards human life and freedom over finding the truth as life and freedom are unique and irreplaceable. For protecting human life and freedom, false-positive errors are the type of errors needed to be controlled first.
The threshold model: Difference between forensic operation and laboratory operation
The threshold refers to the smallest release-stimulus intensity needed to conduct the reaction. Specifically, forensic science focuses more on the threshold as threshold value means that there is a positive or negative result in a test. For example, a positive gunshot residue test result of a suspect means that this person has shot someone or something and vice versa. In this situation, the effect of errors on the threshold value is a critical point for the accuracy of test results.
If a result is under the threshold, the forensic finding may lose its probative value. For example, bone age test is one of the forensic measures to find whether a criminal defendant has reached 16 years, which is the minimum age for taking criminal responsibility. If the bone age result is 14 years and its rate of error is 1 year, the suspect is under 15 years old. Consequently, he/she cannot take criminal responsibility. On the other hand, if the result is 15 years, the forensic test fails to prove any case because of the uncertainty of his/her age for criminal responsibility. This example also proves that the probative value of forensic evidence is relevant in cases. However, physical and chemical testing processes contain many unknown variables, such as contaminated or degraded samples. Due to the uncertainties of environmental factors, the potential rate of error is higher in the same experiment using the scientific method.
Theoretical Error Evaluation Mechanisms: Peer Review
Peer review connotes the evaluation of scientific or other scholarly work by others presumed to have expertise in the relevant field. Specifically, peer review in terms of Daubert v. Merrell Dow Pharmaceuticals, Inc., refers to the evaluation of submitted manuscripts to determine the works published in professional journals and the books published by academic presses (a context in which it is also called “refereeing” “editorial peer review” or “prepublication peer review”). The phrase is used in a much broader sense, however, to cover the history of the scrutiny of a scientist's work within the scientific community and of others' efforts to build on it.
Peer review could help judges to make decisions on the reliability of scientific principles, methods, and their applications. However, Chubin, the amicus curiae of Daubert v. Merrell Dow Pharmaceuticals, Inc., noted that “the peer review system is designed to provide a common and convenient starting point for scientific debate, not the final summation of existing scientific knowledge,” and that “contrary to the generally accepted myth, publication of an article in a peer review journal is no assurance that the research, data, methodologies, (or) analyses… are true, accurate,… reliable, or certain or that they represent “good science.” Chubin et al. claimed that although peer review and publication could not ensure the reliability of scientific principle absolutely, judges, as nonprofessionals in science, still have to rely on this measure to judge the admissibility of forensic evidence. After all, scientific principle published after peer review means the specific principle had been certified by experts in a certain field before being utilized in the case. As far as judges are concerned, adopting expert testimony based on whether the scientific principle involved has been published requires less responsibility from them in deciding whether a scientific principle or method involved is reliable, thereby reducing the risk of misjudging cases.
Efficiency is another reason that leads judges to rely on peer review. The aim of peer review is usually to decide whether an academic study is logical enough to be published on a journal. In the review stage, predicting whether the academic principle would be utilized in a suit is far beyond specialists' abilities. As a result, generally, peer reviews and publications are reliable.
This study discusses errors in terms of forensic science. There are three categories of errors (systematic errors, random errors, and gross errors) and two types of errors (false-positive errors and false-negative errors). These three categories of errors affect the reliability of expert testimony in three measures. Systematic errors lead to all results obtained from this method or equipment tending toward one direction; therefore, experts depending on these data would always give inaccurate opinions. Random errors, however, are inevitable, but the risk of such errors can be minimized by replications or contract tests. On the other hand, gross errors, caused by human factors, should be eliminated in forensic science, especially in admissible expert testimony. For forensic operators, abiding by Accreditation Criteria for the Competence of Testing and Calibration and conducting the operation through two examiners independently could reduce all three categories of errors.
Cross-examination is a main measure to reduce all errors in forensic science, including the three categories and the two types of errors. In a cross-examination process, attorneys play a pivotal role in finding and minimizing errors. Through cross-examination and expert testimony, the attorney can help the trier understand that it might have been feasible for the burdened party to have presented a better point estimate and a narrower confidence interval. As a confidence level of 95% is a common standard in the majority of experimental subjects, forensic science also abides by this standard.
Scientific principles and methods undergoing peer review may not be reliable because of the stance of these specialists. Therefore, peer review can be considered as an exclusive standard; hence, if a scientific principle or method fails to pass peer review, it tends to be inadmissible. If it passes peer review and is published, knowledge, professional ethics of the specialists, and the confidence level of the forensic testimony's findings need to be considered by judges, especially when the results are around the thresholds. The results of replication need to be presented in the expert report.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
|1||National Institute of Justice, National Institute of Standards and Technology, U.S. Department of Commence. Latent Print Examination and Human Factors: Improving the Practice through a Systems Approach. Charleston: CreateSpace Independent Publishing Platform; 2012. p. 9.|
|2||Wigmore JH. Evidence in Trials at Common Law, revised edition. New York: Little, Brown and Company; 1970. p.782.|
|3||Imwinkelried EJ. The end of the era of proxies. Evid Sci 2011;19:461.|
|4||Daubert V. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579. 1993.|
|5||Wang J. The Federal Rules of Evidence. Beijing: China Legal Publishing House; 2012. p. 215.|
|6||National Academy of Science, National Research Council & Committee. On the Needs of the Forensic Science in the United States: A Path Forward. Washington D.C.: National Academy of Science, National Research Council & Committee; 2009. p. 111.|
|7||National Institute of Justice, National Institute of Standards and Technology, U.S. Department of Commence, supra note 1, at 21.|
|8||See Jia J, He X, Jin Y. Statistics, 4th ed. Beijing: China Renmin University Press; 2009. p. 33.|
|9||Wang N, Guo J. Modern General Surveying. 2nd ed. Beijing: Tsinghua University Press; 2001. p. 101.|
|10||National Institute of Justice, National Institute of Standards and Technology, U.S. Department of Commence, supra note 1, at 12.|
|11||See Flaherty MP. 400 drunken-driving convictions in D.C. based on flawed test, official says. The Washington Post [Internet]. 2010. Available from: http://www.washingtonpost.com/wp-dyn/content/article/2010/06/09/AR2010060906257.html?wprss=rss_metro. [Last accessed on 2010 Jun 10].|
|12||China National Accreditation Service for Conformity Assessment. Accreditation Criteria for the Competence of Testing and Calibration Laboratories (People's Republic of China), 2006.|
|13||Wang & Guo, supra note 9, at 101.|
|14||Id. at 101-2.|
|15||National Institute of Justice, National Institute of Standards and Technology, U.S. Department of Commence, supra note 1, at 34.|
|16||Wang, supra note 5, at 220-2.|
|17||Matlack WF. Statistic for Public Policy and Management. Belmont: Duxbury Press; 1980. p. 222.|
|18||Speight JG. The Scientist or Engineer as an Expert Witness. Boca Raton: CRC Press, Taylor & Francis Group; 2009. p. 88.|
|19||Id. at 88.|
|20||Kaye DH. Is proof of statistical significance relevant? Wash Law Rev 1986;61:1337.|
|21||Id. at 1337.|
|22||Liu X. Standards of forensic science operation: Focusing on ways of controlling. In: He J, editor. Evidence Forum on Evidence. Beijing: China Procuratorate Press; 2007. p. 243-4.|
|23||Haack S. Peer review and publication: Lessons for lawyers. Stetson Law Rev 2007;36:789.|
|24||Id. at 789.|
|25||Kronick DA. Peer review in 18th-century scientific journalism. J Am Med Assoc 1990;263:1321.|
|26||Chubin DE. Amici Curiae in Support of Petrs. At 8, 13, Daubert, 509 U.S. 579 (1993).|
|27||Land DP, Imwinkelried EJ. Confidence intervals: How much confidence should the courts have in testimony about a sample statistic? Crim Law Bull 2008;44:273.|