ORIGINAL ARTICLE Year : 2015  Volume : 1  Issue : 2  Page : 124132 Bayesian Networks for the Age Classification of Living Individuals: A Study on Transition Analysis Emanuele Sironi, Franco Taroni School of Criminal Justice, Faculty of Law, Criminal Justice and Public Administration University of Lausanne, Lausanne, Switzerland Correspondence Address: Over the past few decades, age estimation of living persons has represented a challenging task for many forensic services worldwide. In general, the process for age estimation includes the observation of the degree of maturity reached by some physical attributes, such as dentition or several ossification centers. The estimated chronological age or the probability that an individual belongs to a meaningful class of ages is then obtained from the observed degree of maturity by means of various statistical methods. Among these methods, those developed in a Bayesian framework offer to users the possibility of coherently dealing with the uncertainty associated with age estimation and of assessing in a transparent and logical way the probability that an examined individual is younger or older than a given age threshold. Recently, a Bayesian network for age estimation has been presented in scientific literature; this kind of probabilistic graphical tool may facilitate the use of the probabilistic approach. Probabilities of interest in the network are assigned by means of transition analysis, a statistical parametric model, which links the chronological age and the degree of maturity by means of specific regression models, such as logit or probit models. Since different regression models can be employed in transition analysis, the aim of this paper is to study the influence of the model in the classification of individuals. The analysis was performed using a dataset related to the ossifications status of the medial clavicular epiphysis and results support that the classification of individuals is not dependent on the choice of the regression model.
Introduction In modern society, chronological age is relevant information that defines the legal rights or obligations of an individual. This is, of course, extended to determining the legal capacity of the person or even his status as a refugee.[1],[2] However, determining the age of an individual is not always possible in a formal way because of the lack of valid identification documents. The reason for this is largely due to the increase of bordercrossing movements from countries where there is an absence of formal registration of birth dates or where these documents are not regularly provided.[1] Therefore, in recent times, age assessment of living individuals has become an extremely relevant practice for a large number of forensic and medicolegal services worldwide.[3],[4],[5],[6],[7] Usually, an expertise for age assessment of an individual is requested by a juridical authority to provide an estimate of the chronological age of an individual and/or to evaluate whether this person is older or younger than the threshold age of legal relevance, such as the age of majority or the age of criminal responsibility. These thresholds are countryspecific and are generally included in the age range between 14 and 22 years of age.[3] Thus, it is essential that methods for age estimation be able to discriminate precisely within this range. The assessment of the degree of maturity and development of some physical attributes is extremely useful.[8],[9] For example, following the recommendations of the study group on forensic age diagnostic, the expertise should include the examination of physical attributes such as the secondary sexual characteristics, the dentition and the ossification centers of the left hand and of the collar bones.[10] The degree of maturity of a given physical attribute is generally assessed in categorical stages by means of specific classifications or using a comparative atlas. It is then converted into the information of interest, that is, the chronological age or the probability that the person is older or younger than an age threshold, by means of large panoply of different statistical methods. Among these methods, those developed in a Bayesian framework are particularly interesting because they allow the user to coherently handle the uncertainty associated with age estimation related to forensic and legal purposes.[11],[12] A consistent number of Bayesian methods classify the examined individual within a meaningful class of ages according to the degree of maturity observed in a given physical attribute.[13],[14],[15],[16] Others provide an estimate of the chronological age in the form of a posterior probability distribution.[12],[17] This distribution can then be used for computing the probabilities that an individual is older or younger than an age threshold of interest.[11],[12],[17] Practical application of Bayesian methods can be arduous and timeconsuming depending on the amount of variables involved and on the complexity of their relationships. Thus, the use of specific probabilistic tools, such as Bayesian networks, may be very helpful. Bayesian networks are probabilistic graphical models, which allow one to reason about the structure of an inferential problem and to provide a quantitative assignment of the belief associated with each variable of interest for the problem at hand.[18],[19] Bayesian networks have already been widely used in forensic and medical frameworks [19],[20] and recently a Bayesian network for forensic age assessment has also been presented in scientific literature.[12] One of the main concerns in the Bayesian approach, and thus in Bayesian networks, is the assessment of an initial degree of belief on each variable in the model. This belief is generally expressed as a probability, which can be assigned in a personal way based on previous knowledge or experience, or by means of some meaningful data analysis.[21],[22] In the context of Bayesian age estimation, one of the main interests is the assignment of the probability of observing a given degree of maturity according to the age of an individual: different methodologies have been discussed in current literature,[13],[14],[17],[23] and in the Bayesian network for age estimation, these probabilities are assigned by means of transition analysis.[12] For assessing the probability of interests, this parametric statistical method employs regression models for categorical data (such as logit and probit models) and the results are slightly different according to the chosen model. Aims and structure of the paper This paper focuses on the analysis of the regression model employed in transition analysis and how it influences the results obtained with the Bayesian network for age estimation presented by Sironi, et al.[12] The investigation focuses on both age distribution and probabilistic classification of individuals. The Bayesian approach is explained in the "A Bayesian approach for age estimation" section whereas the network is introduced in the "A Bayesian network for age estimation" section. The "Materials and Methods" section presents the analyses performed and the data sample, which has been published by Kreitner et al.[24] Data are related to the ossification status of the medial clavicular epiphysis, assessed by means of a fourstage classification:[25] the low number of developmental stages needed for describing the degree of maturity makes this physical attribute quite suitable for the purpose of the study performed in this work. Moreover, even though following the recommendations of the study group on age diagnostic this physical attribute should be examined only when the development of other attributes (in particular of the hand) is completed,[10] it may also be of interest for discriminating ages around a threshold of 18 years of age.[26],[27] The data sample as published by Kreitner et al.[24] is not suitable for analyses aiming for a practical application of a given method. However, it is widely adapted for preliminary studies, such as those presented in this work. Furthermore, its availability in scientific literature guarantees the total transparency and reproducibility of the analyses. Results are presented in the "Results" section while the "Discussion" section presents a discussion and a general conclusion is presented in the last "Conclusion" section. A Bayesian approach for age estimation In simple terms, the Bayesian approach formally describes the process, which allows one to update an initial belief on a given variable (e.g., the chronological age) in the light of some observed piece of evidence.[28] In this context, all available information or evidence is used for reducing the uncertainty related to the inferential process.[29] This updating process is formalized by means of Bayes' Theorem, which in the current context can be defined as follows, considering the chronological age as a continuous variable:[12],[17] [Inline:1] Where, f (age ǀ S = i) is the posterior distribution on the chronological age given the observation of the ith developmental stage S, f (age), is the prior distribution on the chronological age and Pr (S = i ǀ age) is the probability of observing a given degree of maturity given the age of the individual; m (age) is the normalizing constant:[17],[22] [Inline:2] Where ⊖ is the whole parameter set of the age variable. The posterior probability distribution is a representation of all the uncertainty related to the variable of interest because it encapsulates all available knowledge on the considered case. This distribution can be used further, to compute an interval estimate of the age, namely a credible interval. This kind of interval is an optimal estimate for forensic purposes because it represents the range of values in which the true value of the parameter lies with a defined probability, for example, 0.95.[22] The posterior distribution can also be used for computing the probabilities that the examined individual is older or younger than a given threshold.[11],[12],[17] For example, the probability that the individual is older than 18 years of age is: [Inline:3] Where, θj with j = 1,2 is the space representing the age range of interest (i.e. the ages greater or lower than 18 years of age). Obviously, for computing the probabilities according to another age threshold it suffices to adapt the space of the integer according to the threshold needed. These posterior probabilities can be employed for classifying individuals, for example, by means of the posterior odds, which are the ratio of the posterior probabilities on two exhaustive and mutually exclusive propositions,[22] such as: P1: The examined person is older than 18 years of age (i.e. θ1)P2: The examined person is younger than 18 years of age (i.e. θ2). Then the posterior odds on P1 against P2 can be computed as follows from equation 3: [Inline:4] Thus, the first (or alternatively, the second) proposition is corroborated if the posterior odds are greater (or alternatively, lower) than unity. The strength of the corroboration is defined by the value of this ratio: high (or low) values correspond to stronger corroboration of the first (or second) proposition. The posterior odds are an extremely interesting metric for the classification in age scenarios because they quantify the direction and the force of the support on a set of propositions in an easily interpretable form. Moreover, they can be naturally integrated in a Bayesian decisionmaking process, which may be extremely suitable in the forensic and legal context.[22] A Bayesian network for age estimation Structure of the network Based on elements of both graph and probability theories, Bayesian networks are probabilistic graphical models, which allow the user to assess in a qualitative form the inferential structure of a complex problem by defining the variables, which are involved in the problem and their respective probabilistic relationships. Probability rules are then used to quantitatively specify nature and the strength of these relationships.[18],[19] Elements which compose a Bayesian network are:[18] A finite collection of nodes which represent the variables considered in the problem. Each node is characterized by a set of mutually exclusive states, which define the possible outcomes of the variableA set of directed arcs (or edges), which connect the nodes in the network and represent the probabilistic relationships existing between the variables. The arcs must be organized in order to avoid cycles (i.e. Bayesian networks are directed acyclic graphs)Conditional probability tables (CPT), which are associated to each node in the network. The CPT contains the prior (or initial) probabilities assigned to each variable. These probabilities are conditioned if the node depends on parental nodes, or unconditioned if the node does not have any dependencies (i.e., no arcs from parental nodes). The variables considered in the structure of the network are those expressed in equation 1,[12] namely the chronological age [node Age in [Figure 1]and the observed developmental stage on a given physical attribute [node Stage in [Figure 1]. The two nodes are linked by an arc from Age to the Stage, which models the logical observation that the developmental stage, in which the physical attribute of an individual may be found, depends on his age. A third node Class, taking into account the two propositions of interest (i.e., the examined person is older or younger than 18 years of age), is also considered in the structure. Since the probabilities of the propositions depend on the probability assigned to the chronological age (equation 3), the two variables are linked by an arc from Age to Class [Figure 1].[12]{Figure 1} Given that the observation is related to the ossification states of the medial clavicular epiphysis of an individual, as described in section 2, then the states of the network's nodes can be defined as shown in [Table 1].[12]{Table 1} Note that the continuous variable age was discretized into disjoint intervals only for practical reasons.[12],[30] Probability assignment The assignation of the probability in the CPT for each node in the network should reflect the initial (or prior) belief that the user may have. The prior probability on the chronological age may be assigned by the examiner based on its previous knowledge or past experience.[11],[31] Or it may be considered the distribution of the age of the persons potentially under examination.[12] As shown in equation 3, the prior probabilities of the propositions, defined in the CPT of the node Class, are a logical subdivision of the probability distribution assigned to the node Age according to the age threshold of interest. The probabilities associated with the node Stage (i.e. the probability of observing a given developmental stage according to age) are assigned by means of transition analysis, which is a parametrical statistical tool for modeling the transition between ordered developmental stages in a given population.[32] In practice, transition analysis provides a set of distributions of the "ageattransition", that is, the ages in which individuals of the population make the transition from one developmental stage to the next in an ordered sequence. Transition analysis was developed for anthropological purposes and it is based on two main assumptions: the independence of the sequenced developmental stages given the age and the unidirectionality of the morphological changes with no overlapping allowed.[32] These assumptions are respected by a large number of the developmental classifications used in age estimation of living persons (such as the fourstage classification for the collarbones development), thus the application of transition analysis in this particular context seems to be generally admissible.[12] Transition analysis employs specific regression models for ordinal data and allows one to compute the probability of the variable stage given the age.[32] A large number of regression models have been presented in scientific literature related to transition analysis [Table 2].{Table 2} Amongst these models, the proportional odds (logit and probit) models are the least suitable for the purpose of age estimation. They focus on the assumption that all regression curves have the same slope so that in the transition analysis paradigm all the "ageattransition" distributions present the same age variance.[32] Although this assumption is attractive because it facilitates statistical analyses,[42] it does not fit with the biological principles of the maturity process, since it seems natural that the dispersion of the ageattransition increases with increasing stages.[32] In some publications, the age variable has been used in the logarithm scale: In this way, negative ages are excluded from the analysis, which seems to be logical. However, the use of such transformation may sometimes create difficulties in the regression analysis, depending on the structure of the available sample. Different regression models seem to produce similar (but not identical) results,[43] therefore a light influence in age estimation or in the probabilistic classification may be observed. Excluding the models based on the assumption of a common slope for all regression curves, it is of interest to investigate the potential influence of some regression models in age assessment, for example the continuation ratio logit and the unrestricted cumulative probit [Table 2]. In the transition analysis framework, probabilities of the stage, given the age of the person, are computed in different ways according to the regression models used. With the continuation ratio logit, the probabilities can be computed as follows:[12],[42] [Inline:5] Where αi and βi are the estimated intercept and the slope of the regression curves and F (.) is the cumulative distribution function (CDF) of the logistic distribution. With the unrestricted cumulative probit, the probabilities of interest can be obtained in this way:[42] [Inline:6] Where ϕ (.) is the CDF of the standard normal distribution. Materials and Methods For the analyses performed in this work the data sample presented by Kreitner et al. was used.[24] The sample contains 380 subjects defined by their age (between 0 and 30) and their ossification status of the medial clavicular epiphysis, fixed by means of a computed tomography scan examination. The ossification status was assessed with a fourstage classification by an experienced expert: stages of the classification correspond to a nonunion without ossification of the epiphysis (Stage 1), a nonunion with detectable ossification or the epiphysis (Stage 2), a partial union (Stage 3) and finally a complete union of the ossified epiphysis (Stage 4).[24],[25] The ages in the data sample are expressed only in years and subjects counted in an interval of ages (i.e. 0–4 and 5–9) were considered as being in the maximal age of the range (i.e. 4 and 9 years old). In order to provide further investigations, 1000 subsamples of 250 individuals were produced from the original data sample. All analyses were performed with the Bayesian network for age estimation. Details on the network's nodes are shown in [Table 1] above. In order to focus on the effects of transition analysis, the CPT of the node Age was compiled with a uniform distribution between 0 and 120 years of age (i.e. a rational interval for a human lifetime) as a prior distribution on the chronological age. Prior probabilities on the propositions in node Class are obtained accordingly. The probabilities associated with the node Stage were assigned by means of transition analysis, applied with the two different regression models aforementioned. Transition analysis was applied to the 1000 subsamples produced from the original data sample and the probabilities entered in the CPT of the node are assigned respectively by means of equations 5 or 6. The Bayesian networks were used to provide the posterior probability distribution on the chronological age and the posterior probabilities on the two propositions of interest, both according to the observed developmental stage [Figure 2].{Figure 2} The posterior distribution of the variable age was then used for computing 95% highest posterior density (HPD) credible intervals:[22] the values of the lower and upper boundaries presented in this paper are the means of the 1000 values obtained from each subsample. The posterior probabilities of the propositions provided by the network were used for computing the 1000 posterior odds on P1 against P2. The distributions of the posterior odds according to the regression models were investigated, and the difference between the distributions was computed by comparing posterior odds computed with the same subsample. Differences are always computed by subtracting the data related to the unrestricted cumulative probit from the data related to the continuation ration logit, and they are normalized according to the value of the posterior odds. Results related to the fourth stage of the classification for the ossification status of the medial clavicular epiphysis have not been studied, since there is a high probability that all the individuals are in the fourth developmental stage after 26 years of age.[44] For the analyses, the statistical software R was employed:[45] the Bayesian networks for age assessment were run with the RHugin package,[46] whereas the regression analysis was carried out with the vector generalized linear and additive models package.[47] Other analyses were performed with routines written by the authors in the R language. Results [Table 3] presents the boundaries of the 95% HPD intervals computed from the posterior probability distributions obtained according to the different regression models employed for transition analysis.{Table 3} The estimates provided with the logit and the probit models are very similar: the 95% HPD intervals cover about the same ranges of ages. The value of the lower boundary of the intervals produced when the first stage is observed is due to the assignation of the prior on the chronological age. Since all subjects in the data sample having an age close to zero were logically found in the first developmental stage, the probability of observing this stage given low ages is extremely high (close to one). Due to the uniform prior, this event is reported in the posterior probability distribution of the age, which is extremely high (close to one) in the region of low ages when the first stage is observed. Since the HPD intervals include the region of the variable (i.e., the chronological age) for which the posterior probability distribution is highest, in this case this region corresponds to the lowest ages, up to the zero value. A similar analysis was performed for the computed posterior odds: distributions of the posterior odds obtained from the 1000 subsamples are shown in visual form in [Figure 3].{Figure 3} The histograms in [Figure 3] show that the distribution of the posterior odds given the second and the third stage are visually similar and cover the same ranges of values, for the two models respectively. However, a slight difference may be observed between the two distributions, as illustrated in [Figure 4] and [Table 4].{Figure 4}{Table 4} Concerning the first developmental stage, histograms in [Figure 3] show an important difference between the distributions of the posterior odds. The odds obtained with the probit model are closer to the zero value than those related to the logit model. The difference observed between the posterior odds computed according to the two regression models does not have an impact on the classification of individuals. In fact, considering the first and second stages, for both logit and probit models, the posterior odds are always lower than one and thus they always classify the individuals as younger than 18 years of age. The same reasoning can be made for the third stage, and in this case, individuals would always be classified as older than 18 years of age, since the posterior odds are greater than one (see the "A Bayesian approach for age estimation" section). However, the strength of corroborating one or the other proposition may change according to the regression model employed. [Figure 4] and [Table 4] show that for the first and the third stages, the differences computed tend to be positive. That means that the continuation ratio logit seems to produce a greater value of posterior odds for each subsample. For the first stage, this is also easily observable in [Figure 3]. However, the interpretation of this observation changes according to the developmental stage. For the first stage, the strong discrimination between the propositions is provided under the unrestricted cumulative probit [Figure 3], since the more the posterior odds are small, the more they corroborate the second proposition. Conversely, for the third stage the strongest corroboration is given by higher posterior odds, thus under the continuation ratio logit. For the second stage, the difference tends to be negative [Figure 4] and [Table 4], hence, the probit model seems to provide higher posterior odds. Similarly to the first stage, all the posterior odds given in the second stage support the second proposition and thus a better discrimination is provided by the lowest value of the posterior odds. Hence, in this particular case, the continuation ratio model provides the strongest corroboration. Based on this observation, it is therefore not possible to define, which regression models generally provide a better discrimination between the propositions of interest. However, it is possible to observe that the differences computed are close to zero, especially for the second and the third developmental stages [Figure 4]. Thus, the support provided by the two models on one or the other proposition is extremely similar. Discussion One of the difficulties in the use of Bayesian networks is the assignment of the initial probabilities for each variable of relevance in the model. In the Bayesian network for age estimation, the probabilities associated with the node Stageare assigned by means of transition analysis. Since this parametric method may be implemented by using different kinds of regression models, the aim of this paper was to investigate the effect of the model on the final results provided by the network in the form of posterior probability distribution on the chronological age and of posterior odds. The analyses were performed with two regression models, namely the continuation ratio logit and the unrestricted cumulative probit. Other methods presented in scientific literature related to transition analysis are not relevant for this work because they do not fit the biological developmental process (see the "A Bayesian network for age estimation" section). The results show that the difference between the posterior odds produced by adopting the logit model and the probit model does not seem to affect the age assessment per se in fact, the 95% HPD intervals produced according to the two regression models include the same ranges of ages. Moreover, both the logit and the probit models always provide posterior odds which support the same proposition according to a given developmental stage and thus, individuals would be classified in the same class, independently of the regression model employed in transition analysis. Changes are only related to the strength of the corroboration of one or the other of the propositions. Finally, it does not seem that the regression model employed in transition analysis has a relevant impact on age assessment. However, the use of the cumulative model presents some practical advantages. In fact, due to their nature, it is possible to collapse some consecutive stages into a new one without affecting the estimation of the parameter related to the other transitions, and this latter can be a useful feature.[36] The elicitation of the prior probabilities of the chronological age is also a matter of debate. For some quarters, this assignment is a reason for not applying the Bayesian approach. In reality, this should not be viewed as a problem, because the definition of the prior distribution is a natural process:[31] Every examiner has an initial belief on the age of a given person, which is based on previous knowledge or past experiences.[11],[12] The user can easily vary his prior in the node Age in order to assign the probability which better fits with his belief. Sensitivity analysis for examining the prior distribution may be performed to investigate the robustness of results. In this work, a uniform prior has been used, in order to highlight the effects of transition analysis on the final results. However, for practical applications, this kind of prior should be avoided. Although some quarters claim that a uniform prior is an acceptable solution to avoid a strong effect on the chronological age, it seems illogical to assign equal probabilities to all the ages in a given range. This is because it is highly unlikely that persons in some categories of age (such as between 0 and 10 years of age, or older than 50 years of age) would be asked to be examined for age assessment purposes. Therefore, the posterior odds on the chronological age are strongly biased by the uniform distribution because individuals in specific extreme age ranges would generally not be asked to be examined for forensic age assessment purposes. Thus, it seems suitable to assign a priori low probabilities for these age ranges. Conclusion The use of the Bayesian approach for age assessment presents a large number of advantages for forensic and legal purposes. Results are provided in a useful form for forensic age estimation or individual classification according to an age threshold, and this is fundamentally relevant. In this context, the use of Bayesian networks would facilitate the practical application of the probabilistic approach. Results provides by a Bayesian methods are logically influenced by the assignment of the probabilities of interests. In the age estimation frameworks, for example, probabilities to observe developmental stage given the age can be assigned by means of the transition analysis. This statistical method employs regression models adapted to deal with categorically ordered data, such as probit or logit model. The aim of this work was to investigate the influence of the regression model on the results provided by the Bayesian network for age estimation. Analyses showed that the choice of the regression model does not have a relevant impact on the classification of individuals as younger or older than a given age threshold. Acknowledgments The authors would like to thank the organizers of the first International Symposium on Sino Swiss Evidence Science, the Collaborative Innovation Center of Judicial Civilization (China) and the School of Criminal Justice of the University of Lausanne (Switzerland) for their support on the authors' participation at the symposium. Financial support and sponsorship Nil. Conflicts of interest There are no conflicts of interest. References


