Morrison & Enzinger
Independent Forensic Consultants

http://forensic-evaluation.net/

last update
2016-04-29

Forensic Consulting, Casework, and Training

  • evaluation of forensic evidence

    • forensic inference and statistics

    • logical reasoning

    • validity and reliability

  • forensic speech science

    • forensic voice comparison

    • disputed utterance analysis


News:




Some portions of this website use US spelling, other portions use UK spelling.




Problems with Forensic Science
  • There has been substantial criticism of current practice across many branches of forensic science, for example in:

  • Problems include:

    • Forensic scientists making strength of evidence statements which are inconsistent with logic.

    • Lawyers and judges misinterpreting logically appropriate strength of evidence statements made by forensic scientists.

    • Lack of transparency in forensic scientists’ practice and reporting.

    • Lack of procedures to safeguard forensic scientists from the influence of cognitive bias due to factors such as exposure to other evidence in the case.

    • Lack of empirical demonstration by forensic scientists of the degree of validity and reliability of analytical approaches and procedures which they employ.

    • Lack of understanding by lawyers and judges (and forensic scientists) as to what constitutes scientifically appropriate testing of validity and reliability.



Paradigm for the Evaluation of Forensic Evidence
  • There is a paradigm for the evaluation and interpretation of forensic evidence which if properly implemented provides solutions for many of the problems outlined above.


  • The paradigm consists of:


Logically correct framework for the evaluation and interpretation of evidence
  • The likelihood-ratio framework has been standard in forensic DNA analysis since the mid 1990s, and its use is spreading across other branches of forensic science.


  • It is considered by leading experts in the field to be the logically correct framework for the evaluation and interpretation of forensic evidence, and applicable across all branches of forensic science. For example:

    • In 2010 the England & Wales Forensic Science Regulator stated that it is not logical to adopt the likelihood-ratio framework for forensic DNA analysis but not adopt it for other branches of forensic science.

    • In 2011 a statement that the likelihood-ratio framework is the most appropriate framework for the evaluation of forensic evidence was signed by 31 forensic scientists, forensic statisticians, and legal scholars and endorsed by the board of the European Network of Forensic Science Institutes (ENFSI) representing 58 laboratories in 33 countries.

    • The 2012 US National Insitute of Standard and Technology / National Institute of Justice NIST/NIJ report on fingerprints also supported the adoption of the likelihood-ratio framework.

    • In 2015 ENFSI released its Guideline for Evaluative Reporting in Forensic Science, recommending the use of the likelihood ratio framework for assessing the strength of forensic evidence.


  • The likelihood-ratio framework describes the necessary logic for evaluating the strength of forensic evidence, and this logic holds irrespective of how exactly it is expressed, what the framework is called, and any practical problems in implementing the framework.


  • The framework makes explicit the issues which the forensic scientist must logically address and make the trier of fact aware of.


  • Properly applied, it requires the forensic scientist to make clear the specific hypotheses for which they are providing an answer. This is necessary so that the trier of fact can understand the question. The trier of fact cannot decide whether the forensic scientist is answering an appropriate question, and cannot understand the forensic scientist’s answer, unless the trier of fact understands the question.


  • It requires the forensic scientist to evaluate the strength of evidence only for the particular materials they have been asked to evaluate, and not to take into consideration other evidence presented in the trial or other information extraneous to their particular task.


  • It requires the forensic scientist to consider not only the probability of the evidence given the prosecution hypothesis, but also the probability of the evidence given an appropriate defence hypothesis. For example, in a source-level comparison in which one sample is of known origin and another of questioned origin (the samples could be fingerprints, glass fragments, textile fibres, voice recordings, etc.), the forensic scientist must consider not only the similarity of the questioned sample to the known sample but also its typicality with respect to some relevant population.


  • The forensic scientist must evaluate the relative probabilities of the evidence if the prosecution hypothesis were true versus if the defence hypothesis were true. The defence hypothesis they adopt must specify the relevant population for the particular case, and they must clearly explain the specific hypothesis adopted to the trier of fact.


  • For example, if a blond hair is found at the crime scene and a suspect has blond hair, the probability of finding a blond hair at the crime scene is very high had it come from the suspect (the crime scene hair and the suspect’s hair are similar), but this by itself is of little probative value.


  • If the crime was committed in Stockholm and the relevant population adopted is the general population of Stockholm, the probability of finding a blond hair at the crime scene had it come from someone selected at random from this population is also high (blond hair is very typical in this population). The probability of the evidence (finding a blond hair at the crime scene) if the prosecution hypothesis were true (that it came from the suspect) will only be a few times higher than if the defence hypothesis were true (that it came from someone else in Stockholm) – the strength of the evidence with respect to these hypotheses will not be very high.


  • In contrast, if the crime was committed in Beijing and the relevant population adopted is the general population of Beijing, the probability of finding a blond hair at the crime scene had it come from someone selected at random from this population is very low (blond hair is atypical in this population). The probability of the evidence (finding a blond hair at the crime scene) if the prosecution hypothesis were true (that it came from the suspect) will be many times higher than if the defence hypothesis were true (that it came from someone else in Beijing) – the strength of the evidence with respect to these hypotheses will be very high.


  • If other defence hypotheses were adopted, such as the hair came from the victim, the question asked, and the strength of evidence answer would be different again.


  • The forensic scientist must make explicit the hypotheses which they have adopted so that the trier of fact can consider whether the forensic scientist is answering an appropriate question and so that the trier of fact can understand the answer.


  • The probability of the evidence given the prosecution hypothesis divided by the probability of the evidence given the defence hypothesis is known as a likelihood ratio and provides a quantification of the strength of the evidence – how many times more likely is the evidence if the prosecution hypothesis were true than if the defence hypothesis were true. The logo at the top of this page is a symbolic expression of a forensic likelihood ratio.

    • LR = p(E|Hp)/p(E|Hd)




Approaches based on relevant data, quantitative measurements, and statistical models
  • Approaches based on relevant data, quantitative measurements, and statistical models are more transparent and more easily replicated than approaches in which the output is directly a forensic scientist’s experience-based subjective judgement. A forensic scientist can even provide copies of their data and software to a second forensic scientist charged with scrutinising their work. Any errors in implementation can potentially be traced in a way which is not possible for an approach in which the output is justified solely by experience-based judgement.


  • Subjective decisions have to be made with respect to selecting relevant hypotheses, and relevant data and statistical models to evaluate the relative probabilities of the evidence given those hypotheses, but these decisions are remote from the ultimate output of the forensic-evaluation system compared to approaches in which the output is directly a forensic scientist’s experience-based subjective judgement. Approaches based on relevant data, quantitative measurements, and statistical models are therefore more robust to the potential effects of cognitive bias.


  • In order to calculate the probability of the evidence given the defence hypothesis, the forensic scientist must select data which are representative the relevant population and which reflect the conditions of the sample of known origin.


  • For example, if the known sample is a rolled fingerprint and the questioned sample is a smeared partial finger mark, the probability of the quantitatively measured proprieties of the smeared-partial-finger-mark questioned sample are calculated given a model based on the rolled-fingerprint known sample. The probability of the quantitatively measured proprieties of the smeared-partial-finger-mark questioned sample should therefore be calculated given a model based on rolled-fingerprint data representative of the relevant population. Only if both these probabilities are calculated under the same mismatched conditions can the relative values of the two logically be compared to quantify the strength of the evidence (statistical models may include mechanisms specifically designed to compensate for such mismatches).


  • The subjective decisions made by the forensic scientist should be explained to the trier of fact so that the trier of fact can consider their appropriateness. The forensic scientist should explain to the trier of fact how they selected the data representative of the relevant population and how they made it reflect the conditions of the case under investigation so that the trier of fact can decide whether the forensic scientist has selected data which are sufficiently representative of the relevant population and sufficiently reflective of the conditions of the case under investigation.


  • As a practical matter, approaches based on relevant data, quantitative measurements, and statistical models are also more easily tested than approaches in which the output is directly a forensic scientist’s experience-based subjective judgement.




Empirical testing of the validity and reliability under conditions reflecting those of the case under investigation
  • Empirical testing of the validity and reliability is the only way to ascertain how well a forensic-evaluation system does what is to claimed to be able to do and how consistently it does that. Appeals to experience or to prior admission do not suffice.

    • The US Supreme Court 1993 Daubert standard, and Rule 702 of the Federal Rules of Evidence (revised 2011) regarding admissibility of forensic evidence require the judge to consider the scientific validity of the of the methods employed by the forensic scientist, including the forensic scientist’s implementation of those methods, and including consideration of the known or potential rate of error.

    • The 2009 National Research Council Report report to Congress on Strengthening Forensic Science in the United States was highly critical of current practice in many branches of forensic science and urged that procedures be adopted which include “quantifiable measures of the reliability and accuracy of forensic analyses” (p. 23), “the reporting of a measurement with an interval that has a high probability of containing the true value” (p. 121), and “the conducting of validation studies of the performance of a forensic procedure” (p. 121).

    • The England & Wales Forensic Science Regulator’s 2014 Codes of Practice and Conduct require all technical methods and procedures used by the forensic scientist to be validated and the validation to be demonstrated prior to use in casework (see also the 2014 Draft guidance: Digital forensics method validation). Such validation normally includes assessment of both validity and reliability. Validation includes testing of all components which make up the method, including the competence of any human who performs part of the process. Testing should be conducted on known data which simulate casework conditions, and real casework data where appropriate.


  • In order to provide relevant information as to how the system is expected to perform under the conditions of the case under investigation, it must be tested using relevant data (e.g., data sampled from the relevant population specified by the defence hypothesis) under conditions which reflect the conditions of the samples from the case under investigation.


  • For example, if the known sample is a rolled fingerprint and the questioned sample is a smeared partial finger mark, the system should be tested using pairs of samples in which one member of each pair is a rolled fingerprint and the other a smeared partial finger mark. Testing under other conditions, e.g., rolled fingerprint versus rolled fingerprint, will not be particularly informative as to how well the system is expected to perform on the actual known and questioned samples from the case under investigation.


  • The forensic scientist should explain to the trier of fact how they selected the test data and how they made it reflect the conditions of the case under investigation so that the trier of fact can decide whether the forensic scientist has selected data which are sufficiently representative of the relevant population and sufficiently reflective of the conditions of the case under investigation.


  • The tester must know whether each test pair is, for example, a same-origin pair or a different-origin pair, but the system being tested must not have access to this information. The system must be presented with a large number of test pairs and provide a strength-of-evidence output for each pair. The tester then compares the output of the system for each pair with the tester’s knowledge as to whether it was a same-origin or different-origin pair, assigns a score to each output on the basis of this comparison, and finally average over all the scores to provide a measure of the system’s performance (in this case its validity, measuring reliability requires additional steps).


  • Protocols for testing validity and reliability can be applied to different forensic-evaluation systems (including systems based on relevant data, quantitative measurements, and statistical models, and systems for which the output is directly a forensic scientist’s experience-based subjective judgement) treating each system as a black box and therefore treating each system equally.


  • Irrespective of a system’s internal architecture (e.g., whether it be a statistical-model or experience-based system) the degree of validity and reliability of the system should be empirically demonstrated so that a judge at an admissibility hearing (e.g., a Rule 702 / Daubert hearing) can decide whether system performance is sufficient for it to be admitted, and so that a trier of fact can decide to what extent they can trust the output of the system.





Forensic Voice Comparison
  • Forensic speech science is a cover term which includes two things that we do:

    • forensic voice comparison

    • disputed utterance analysis


  • In forensic voice comparison the forensic scientist compares the properties of the speech on a recording of a speaker of questioned identity with the properties of the speech on a recording of a speaker of known identity in order to evaluate the strength of evidence so as to assist the court to determine is the speaker of questioned identity is the same as the speaker of known identity or not.


  • Forensic voice comparison is our preferred term. Other terms which have been used include:

    • forensic speaker comparison

    • forensic speaker recognition

    • forensic speaker identification

    • forensic voice identification

    • voiceprint identification

    • voicegram identification


  • In a disputed utterance analysis the forensic scientist analyses a portion of a speech recording in order to evaluate the strength of evidence so as to help the court to decide whether the speaker said one thing or another, for example: Did a Conservative Canadian politician in 2015 say “NDP whore” or “NDP horde”?




Introduction to Forensic Voice Comparison
  • A common scenario in a forensic voice comparison case is that the recording of the speaker of questioned identity is a recording of the offender speaking on a telephone, and the recording of the speaker of known identity is a recording of a suspect during a police interview.

    • The prosecution hypothesis will be that the speaker on the offender recording is the suspect.

    • The defence hypothesis will be that the speaker on the offender recording is not the suspect but some other speaker selected at random from the relevant population.

    • Determining what constitutes the relevant population may not be trivial.


  • There will usually be a mismatch between the recording conditions of the suspect and offender recordings:

    • The offender recording may be of a thirty second long casual conversation, recorded from a mobile telephone intercept (mobile telephone systems distort and lose acoustic information), with traffic noise in the background.

    • The suspect recording may be of an hour long police interview, recorded in a small room with substantial reverberation (echoes), with air conditioner noise, and saved in a compressed format (compressed formats such as MP3 save disc space but distort and lose acoustic information).

    • Poor recording conditions and mismatched recording conditions results in poorer performance from forensic voice comparison systems, and trying to get good performance despite these challenges have been a major focus of research.


  • Recommended publication – A relatively detailed introduction to forensic voice comparison intended to be accessible to a broad audience, including lawyers and judges is:


Testing the Validity and Reliability of Forensic Voice Comparison
  • Recording conditions in forensic voice comparison cases can be very challenging.

    • Systems which work well under studio-quality recording conditions may work very poorly under the conditions found in forensic casework.

    • Case conditions are highly variable form case to case, and the performance of a system under the conditions of one case may be very different form its performance under the conditions of another case.


  • Calls for the validity and reliability of implementations of approaches to forensic voice comparison to be empirically tested under conditions reflecting casework conditions date back to the 1960s.


  • The system which is uses is the system which must be tested:

    • If a practitioner uses multiple approaches and combines the results, then it is the combined system which must be tested.

    • If a practitioner directly reports the output of a statistical model as their strength of evidence statement, then it is the output of the statistical model which must be tested.

    • If a practitioner makes a subjective judgment based on the output of a statistical model, and reports that subjective judgment as their strength of evidence statement, then it is the system including that subjective judgment which must be tested.


  • Unfortunately, most people representing themselves as experts in forensic voice comparison still do not empirically test the validity and reliability of the system they use for casework using data which represent the relevant population and which reflect the conditions of the case under investigation.


  • Recommended publication – A review of calls for the testing of validity and reliability of forensic voice comparison under casework conditions appears in:


Approaches to Forensic Voice Comparison
  • Historically, and still in current practice, there are four basic approaches to forensic voice comparison, which we denominate auditory, spectrographic, acoustic-phonetic (which we further divide into non-statistical and statistical acoustic-phonetic approaches), and automatic. Practitioners may use a mixture of different approaches (e.g., auditory-spectrographic, auditory-acoustic-phonetic), but for simplicity we will describe each one separately:

  • The strength of evidence conclusions of the auditory, spectrographic, and acoustic-phonetic non-statistical approaches are intrinsically based primarily and directly on subjective judgment. The strength of evidence conclusions of the acoustic-phonetic statistical and the automatic approaches may be directly the output of statistical models, or the practitioner may use the statistical model’s output as input to them making a subjective judgment.


  • A recent INTERPOL survey found that the auditory-acoustic-phonetic non-statistical, and the spectrographic / auditory-spectrographic approaches were the most popular among law-enforcement agencies. The human-supervised automatic approach was third most popular.

  • The same survey found that by far the most popular framework for the evaluation of the strength of evidence and the reporting of results was identification / exclusion / inconclusive, despite this framework not being logically sustainable. Verbal expression of likelihood ratios and numeric likelihood ratios were, however, the second and third most popular.



    The brief descriptions of approaches to forensic voice comparison below, written by Dr Morrison, are extracted from a primer on forensic voice comparison intended as part of a potential law review article.




Auditory Approach
  • In an auditory approach (aka aural approach) the practitioner listens to the suspect and offender recordings. They listen in search of similarities which they would expect to hear if the two recordings consisted of speech from the same speaker, but not if they consisted of speech from different speakers. They also listen in search of differences which they would expect to hear if the two recordings consisted of speech from different speakers, but not if they consisted of speech from the same speaker. They may listen to the pronunciation of particular vowel sounds or the particular consonant sounds, the pronunciation of particular words or phrases, and other more global properties such as intonation patterns and the auditory effects of vocal-fold properties and settings. Practitioners will typically have training in auditory phonetics, including training in transcribing the speech sounds they hear using a phonetic alphabet. Thus the practitioner will have a means of documenting what they hear and highlighting the similarities and differences that they consider to be pertinent. Practitioners may have tools which allow them to listen to short sections of speech from each recording, one immediately after the other. They may also listen to sections of speech from other speakers who act as foils, i.e., speakers who sound broadly similar to the speaker of interest but who are known to be different speakers (similar to people in an eye-witness lineup looking similar to the person of interest). The practitioner may be presented with multiple recordings of each of a number of speakers, without being told which are the suspect, offender, and foils, and be asked to group the recordings by speaker.


  • The conclusion emerging from an auditory approach is directly the practitioner’s subjective judgment based on listening to the speech recordings.





Spectrographic Approach
  • In a spectrographic approach the practitioner takes parts of the audio recordings (typically words or phrases) and converts them into pictures. These pictures are called spectrograms. In the context of forensic voice comparison, spectrograms have also been called voiceprints and voicegrams. The practitioner looks at spectrograms derived from the suspect recording and spectrograms derived from the offender recording, and may also look at spectrograms derived from recordings of foil speakers. Typically the practitioner will look at multiple words or phrases that occur in both the suspect and offender recordings. They may look at particular details in the pictures in search of similarities which they would expect to see if the two recordings were of the same speaker but not if they were of different speakers, and also in search of differences they would expect to see if the two recordings were of different speaker but not if they were of the same speaker. In contrast to other approaches, there has been a tradition for the practitioner of the spectrographic approach to make new recordings of the suspect in which the suspect is required to say the same words as on the offender recording and in the same manner as they were said on the offender recording. This has even been enshrined as a requirement in published standards.


  • In the context of forensic voice comparison, spectrograms have also been called “voiceprints” and “voicegrams”. The term “voiceprint” dates back to at least the 1960s. “Voiceprinting” or “voiceprint identification” referred to a particular approach, and was even a registered trademark. The term quickly fell into disrepute among forensic practitioners, even among practitioners of the spectrographic approach. One objection was that the term implied a false analogy with “fingerprint”. Unfortunately, the term is still widely used by the general public and legal proffesionals, where it is often incorrectly used to refer to forensic voice comparison in general.


  • The conclusion emerging from a spectrographic approach is directly the practitioner’s subjective judgment based on looking at pictures of parts of the speech recordings.


  • Spectrograms were initially produced using specialized hardware which was first developed in the 1940s. Measurements of acoustic properties of speech could be made off the spectrogram. Since at least the early 1990s, it has been possible to produce spectrograms using ordinary computers running signal processing software. Such software calculates numbers which describe the acoustic properties of the speech on the recording, then converts those numbers into pictures. Continued reliance on spectrograms as a basis for subjective judgments is anachronistic given that measurements of acoustic properties can be directly extracted using software and those numbers can be immediately entered into statistical models.





Acoustic-Phonetic Approach
  • In an acoustic-phonetic approach the practitioner uses computer software to make quantitative measurements of acoustic properties of parts of the voice recordings. Measurements may be made on particular speech sounds that occur in both the suspect and offender recordings. The types of measurements made are generally the same as the types of measurements which are made in acoustic phonetics, an area of research which studies the transmission of human speech between the speaker’s vocal tract and the listener’s ear. An example of properties commonly measured are formants. Formants are the resonances of the vocal tract. In the same way that larger drums or longer tubes of wind instruments have lower resonances than smaller drums and shorter tubes, longer human vocal tracts have lower resonances than shorter human vocal tracts. The length of the vocal tract can vary from person to person, but when speaking a person changes the length and shape of their vocal tract to produce a range of different resonance frequencies. The differences between vowel sounds such as “ee”, “oo”, and “ah” are the result of different resonances resulting from the speaker moving their tongue, jaw, lips, etc. to make different vocal tract shapes. Another commonly made measurement is fundamental frequency, which is the acoustic correlate of what listeners perceive as the pitch of someone’s voice, e.g., a deep voice or a high-pitched voice. Whereas formants are related to the length and shape of the vocal tract, fundamental frequency is related to the size of the speaker’s vocal fold and the configuration in which the hold and put tension on their vocal folds. To return to the analogy of a wind instrument, the vocal folds are like the vibrating reed of a woodwind instrument or the vibrating lips of the musician playing a brass instrument. In the same way that the musician can alter the frequency of vibration of the reed or their lips, a speaker alters the frequency of vibration of their vocal folds. Many types of acoustic measurements are the quantitative acoustic parallels of the subjective auditory properties that practitioners of the auditory approach listen for, and many are quantitative parallels of properties which are represented graphically in spectrograms.


  • A practitioner may manually search for all occurrences of a particular speech sound, or word, or phrase which occur in both the suspect and offender recordings. They will then make measurements of the acoustic properties of those units. The numbers resulting from the measurements can then be compared. The practitioner may also make the same types of measurements on the same units in voice recordings from other speakers. The latter could be foil speakers, or could be intended to be a sample of speakers representative of the relevant population in the case. The practitioner will usually make measurements on several different speech sounds, words, and/or phrases, not just one.


  • There are both statistical and non-statistical versions of the acoustic-phonetic approach. In the non-statistical version the conclusion is the practitioner’s subjective judgment based on looking at the raw numbers from the measurements or based on looking at graphical plots of the numbers. In the statistical version the conclusion is based on a statistical model applied to the numbers. Some practitioners directly report the numeric output of the statistical model as their conclusion, other practitioner report a subjective judgment based on consideration of the output of the statistical model.





Automatic Approach
  • In an automatic approach the practitioner uses computer software to make measurements of the acoustic properties of the suspect and offender recordings, and of voice recordings from other speakers who are intended to represent the relevant population for the case. Generally the acoustic measurements are made on the whole of the speakers’ speech in the recordings, and there is no focus on individual speech sounds, words, or phrases. The types of measurements made are the same as those used in speech processing (a branch of signal processing, in turn a branch of electrical engineering). These types of measurements are also applied to other tasks such as automatic speech recognition. An example of a common type of measurement is mel frequency cepstral coefficients (MFCCs). MFCCs are a set of numbers, e.g., 14 numbers, which describe the frequency components (the spectrum) of the speech during a short interval of time, e.g., 20 milliseconds. MFCC measurements are usually made once every 10 milliseconds, i.e., 100 times per second. A set of 14 MFCCs provides a more detailed measurement of the speech spectrum than do traditional acoustic-phonetic measurements, e.g., fundamental frequency plus two or three formants, but the value of an individual cepstral coefficient is not usually interpretable in terms of acoustic-phonetic theory.


  • In an automatic system, the numbers from the measurements are always used as input to statistical models. The practitioner may be involved in selecting what they consider to be appropriate statistical models, appropriate types of measurements, appropriate data for training the statistical models, and in cleaning up the audio recordings before inputting them, but the measurements and statistical models themselves run automatically without additional human intervention.


  • The conclusion emerging from an automatic approach will be based on the output of the statistical model. Some practitioners directly report the numeric output of the statistical model as their conclusion, other practitioner report a subjective judgment based on consideration of the output of the statistical model.





Our Forensic Voice Comparison Analyses
    • We use the likelihood ratio framework.

    • We calculate likelihood ratios using relevant data, quantitative measurements, and statistical models.

    • We empirically test the validity and reliability of our system using data representative of the relevant population and reflective of the conditions of the case under investigation.


  • Our approach to forensic voice comparison is generally a human-supervised automatic approach. In some cicumstances, we may use an acoustic-phonetic statistical approach or a combination of automatic and acoustic-phonetic statistical.


  • Our analyses include the following steps:

    • Specify the hypotheses we have set out to test, including specifying the relevant population we have adopted given the context of the case.

    • Obtain data representative of that relevant population.

    • Use data which are either recorded in or made to simulate the conditions of the suspect and offender recordings in the case.

    • Measure acoustic properties of the suspect, and offender recording, recordings used for training, and recordings used for testing.

    • Use training data to build statistical models tailored to the relevant population and the conditions of the recordings in the case under investigation.

    • Use test data to empirically assess the validity and reliability of our system under conditions refelecting those of the case under investigation.

    • Calculate a likelihood ratio for the comparison of the suspect and offender recording.

    • In casework, once we have trained a system we do not change it – we do not look at the performance results, make changes to the system, then retest the system. We test the system before we compare the actual suspect and offender recordings. We have calculate a likelihood ratio value for the comparison of the suspect and offender recordings only once – we do not calculate the likelihood ratio value then make changes to the system or to the test data and then recalculate the likelihood ratio value.

    • We communicate our choices and procedures to the judge at an admissibility hearing and/or trier of fact at trial so that they can understand the question we have set out to answer, decide if we have made appropriate choices as to relevant population and obtained data which are sufficiently representative of that population and sufficiently reflective of the conditions of the suspect and offender recordings, so that they can decide if the test results are applicable to the case, and so that they can understand the answer we provide to the question we set out to answer.

    • We report the results from the tests of the validity and relibility of our system.

    • We report the results from the comparison of the suspect and offender recording. We directly report the numeric likelihood ratio output of the statistical model as our expression of the strength of the evidence in the case.




Forensic Scientists
  • Dr Morrison and Dr Enzinger have worked together since 2011.


  • They have a mixture of overlapping and complementary skills and knowledge.


  • They collaborate closly on research and on casework.


Dr Geoffrey Stewart Morrison

http://geoff-morrison.net/


Curriculum Vitae



Dr Geoffrey Stewart Morrison is one of the leading thinkers in the world about problems of forensic inference.

No one has thought more carefully about how to identify relevant comparison populations.

Few have his ability to understand and explain forensic statistics.

Prof William C Thompson

School of Law, and Department of Criminology, Law & Society, University of California Irvine

Co-counsel for OJ Simpson in his criminal trial in Los Angeles, 1994–1995

Originator of the terms “prosecutor’s fallacy” and “defense attorney’s fallacy”



  • Dr Morrison receved his PhD from the Department of Linguistics, University of Alberta in 2006.

    • His research focussed on statistical modelling of speech data.


  • He began working in forensic science in 2007 when he was recruited as a Research Associate on a forensic voice comparison research project directed by Dr Philip Rose at the Australian National University.


  • 2010–2013 he was Director of the Forensic Voice Comparison Laboratory at the School of Electrical Engineering & Telecommunications, University of New South Wales.

    • He brought in almost one million dollars in external research funding from US government sponsored research and from an Australian Research Council reserch project in collaboration with several partner organisations including the Australian Federal Police, New South Wales Police, Queensland Police, and the National Institute of Forensic Science.


  • 2010–2013 he was also Chair of the Forensic Acoustics Subcommittee of the Acoustical Society of America.


  • 2012–2014 he was a Subject Editor for the academic journal Speech Communication.


  • In 2015 he had a six-moth post as Scientific Counsel for INTERPOL’s Office of Legal Affairs.

    • His work included collaborating with law enforcement agencies from several countries to develop end user requirements for an investigative speaker identification system, and assisting technical partners in converting those requirements into a functioning system.


  • Since 2015 Dr Morrison has also been an Adjunct Professor at the Department of Linguistics, University of Alberta, 2010–2015 he was an Adjunct Associate Professor.


  • He currently works as an Independent Forensic Consultant.


  • Dr Morrison has provided training in the form of tutorials, workshops, and courses (lasting from hours to weeks) at universities, academic conferences, operational forensic laboratories, and lawyer’s offices around the world (in Europe, Asia, Australasia, North America, and South America).


  • Audience members have included researchers and students, forensic practitioners, police officers, and lawyers. He specialises in presenting material which may appear to be technically challenging in a way which makes the underlying concepts easy to understand.


  • Dr Morrison speaks fluent English and Spanish, and can provide, and has provided, training in both these languages.


  • Dr Morrison has conducted forensic casework and been involved in legal cases in Australia and in the United States, and has been involved in a journalistic case in Canada.


  • His work involves both conducting forensic voice comparison analyses and critiquing reports submitted by others. He has submitted written reports and appeared in court as an expert witness.


  • In 2015 he advised the defence in relation to a Daubert hearing on the admissibility of a forensic voice comparison analysis proffered by the prosecution in US Federal Court.


  • Dr Morrison is an active researcher and has published numerous refereed and invited journal articles, book chapters, and conference proceeding including:

    • Enzinger, E., Morrison, G. S., & Ochoa, F. (2016). A demonstration of the application of the new paradigm for the evaluation of forensic evidence under conditions reflecting those of a real forensic-voice-comparison case. Science & Justice, 56, 42–57. doi:10.1016/j.scijus.2015.06.005

    • Enzinger, E., & Morrison, G. S. (2015). Mismatched distances from speakers to telephone in a forensic-voice-comparison case. Speech Communication, 70, 28–41. doi:10.1016/j.specom.2015.03.001

    • Morrison, G. S. (2014). Distinguishing between forensic science and forensic pseudoscience: Testing of validity and reliability, and approaches to forensic voice comparison. Science & Justice, 54, 245–256. doi:10.1016/j.scijus.2013.07.004

    • Morrison, G. S., & Stoel, R. D. (2014). Forensic strength of evidence statements should preferably be likelihood ratios calculated using relevant data, quantitative measurements, and statistical models – a response to Lennard (2013) Fingerprint identification: How far have we come? Australian Journal of Forensic Sciences, 46, 282–292. doi:10.1080/00450618.2013.833648

    • Morrison, G. S., Lindh, J., & Curran, J. M. (2014). Likelihood ratio calculation for a disputed-utterance analysis with limited available data. Speech Communication, 58, 81–90. doi:10.1016/j.specom.2013.11.004

    • Morrison, G. S. (2013). Tutorial on logistic-regression calibration and fusion: Converting a score to a likelihood ratio. Australian Journal of Forensic Sciences, 45, 173–197. doi:10.1080/00450618.2012.733025

    • Grigoras, C., Smith, J. M., Morrison, G. S., & Enzinger, E. (2013). Forensic audio analysis – Review: 2010–2013. In: NicDaéid, N. (Ed.), Proceedings of the 17th International Forensic Science Mangers’ Symposium (pp. 612–637). Lyon, France: Interpol.

    • Zhang, C., Morrison, G. S., Enzinger, E., & Ochoa, F. (2013). Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison – female voices. Speech Communication, 55, 796–813. doi:10.1016/j.specom.2013.01.011

    • Zhang, C., Morrison, G. S., Ochoa, F., & Enzinger, E. (2013). Reliability of human-supervised formant-trajectory measurement for forensic voice comparison. Journal of the Acoustical Society of America, 133, EL54–EL60. doi:10.1121/1.4773223

    • Morrison, G. S. (2012). The likelihood-ratio framework and forensic evidence in court: A response to R v T. International Journal of Evidence and Proof, 16, 1–29. doi:10.1350/ijep.2012.16.1.390

    • Morrison, G. S., & Hoy, M. (2012). What did Bain really say? A preliminary forensic analysis of the disputed utterance based on data, acoustic analysis, statistical models, calculation of likelihood ratios, and testing of validity. In Proceedings of the 46th Audio Engineering Society Conference on Audio Forensics: Recording, Recovery, Analysis, and Interpretation, Denver, Colorado (pp.203–207). Audio Engineering Society.

    • Morrison, G. S., Ochoa, F., & Thiruvaran, T. (2012). Database selection for forensic voice comparison. In Proceedings of Odyssey 2012: The Language and Speaker Recognition Workshop, Singapore (pp. 62–77). International Speech Communication Association.

    • Morrison, G. S. (2011). Measuring the validity and reliability of forensic likelihood-ratio systems. Science & Justice, 51, 91–98. doi:10.1016/j.scijus.2011.03.002

    • Morrison, G. S. (2010). Forensic voice comparison. In I. Freckelton, & H. Selby (Eds.), Expert Evidence (Ch. 99). Sydney, Australia: Thomson Reuters.

    • Morrison, G. S. (2009). Forensic voice comparison and the paradigm shift. Science & Justice, 49, 298–308. doi:10.1016/j.scijus.2009.09.002


  • Links to Dr Morrison’s publications can be found at http://geoff-morrison.net/


  • Dr Morrison has lived and worked in four Canadian provinces and in six other countries: United Kingdom, Japan, Spain, United States; Australia, and France. His current home base is Vancouver, British Columbia, Canada.




Unless explicitly stated otherwise, any opinions expressed by Dr Morrison are his own and do not necessarily represent the policies or opinions of any of the organisations or individuals with which he is currently or has previously been affiliated. Such opinions include the content of this website.





Dr Ewald Enzinger

http://ewaldenzinger.entn.at/


Curriculum Vitae


[Enzinger’s doctoral] dissertation, in my assessment, is a remarkable and significant step forward in the field. In addition to the formal paradigm structure developed, the candidate demonstrates the viability of the approach through implementation using specific case studies based on three actual legal cases.

This thesis makes a remarkable and significant step towards bridging these two diverse communities [phoneticians and electrical engineers] and represents one of the strongest research advancements in this domain to date.

Prof John H L Hansen

University of Texas at Dallas

Associate Dean for Research & Professor of Electrical Engineering, Erik Jonsson School of Engineering and Computer Science

Professor in School of Behavorial and Brain Sciences (Speech & Hearing)

Distinguished University Endowed Chair in Telecommunications Engineering



  • Dr Ewald Enzinger recently completed a PhD at the School of Electrical Engineering & Telecommunications, University of New South Wales.

    • His research focusses on forensic voice comparison.

    • He was the recipient of a prestigious International Postgraduate Research Scholarship and a NICTA Project Award, and is currently the recipient of a UNSW Faculty of Engineering Postdoctoral Writing Fellowship.

    • He sumitted his PhD dissertation at the end of August 2015.


  • 2008–2015 he held positions at the Acoustics Research Institute of the Austrian Academy of Sciences.


  • He began working in forensic science in 2009 while he was a Master’s student at the University of Vienna.


  • He has worked on an Australian Research Council Linkage Project in collaboration with a number of organisations, including the Australian Federal Police, New South Wales Police, Queensland Police, and the National Institute of Forensic Science.


  • He has also collaborated on forensic voice comparison research with the German Federal Criminal Police Office (Bundeskriminalamt, BKA).


  • Dr Enzinger speaks fluent English and German.


  • Dr Enzinger is an active researcher and has published a number of refereed journal articles and conference proceedings papers including:

      • Enzinger, E., Morrison, G. S., & Ochoa, F. (2016). A demonstration of the application of the new paradigm for the evaluation of forensic evidence under conditions reflecting those of a real forensic-voice-comparison case. Science & Justice, 56, 42–57. doi:10.1016/j.scijus.2015.06.005

      • Enzinger, E., & Morrison, G. S. (2015). Mismatched distances from speakers to telephone in a forensic-voice-comparison case. Speech Communication, 70, 28–41. doi:10.1016/j.specom.2015.03.001

      • Enzinger, E. (2014). A first attempt at compensating for effects due to recording-condition mismatch in formant-trajectory-based forensic voice comparison. Proceedings of the 15th Australasian International Conference on Speech Science and Technology, Christchurch, New Zealand (pp. 133–136). Australasian Speech Science & Technology Association.

      • Enzinger, E. (2013). Testing the validity and reliability of forensic voice comparison based on reassigned time-frequency representations of Chinese /iau/. Proceedings of the IEEE International Workshop on Information Forensics and Security, Guangzhou, China (pp. 13–18).

      • Grigoras, C., Smith, J. M., Morrison, G. S., & Enzinger, E. (2013). Forensic audio analysis – Review: 2010–2013. In: NicDaéid, N. (Ed.), Proceedings of the 17th International Forensic Science Mangers’ Symposium (pp. 612–637). Lyon, France: Interpol.

      • Zhang, C., Morrison, G. S., Enzinger, E., & Ochoa, F. (2013). Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison – female voices. Speech Communication, 55, 796–813. doi:10.1016/j.specom.2013.01.011

      • Zhang, C., Morrison, G. S., Ochoa, F., & Enzinger, E. (2013). Reliability of human-supervised formant-trajectory measurement for forensic voice comparison. Journal of the Acoustical Society of America, 133, EL54–EL60. doi:10.1121/1.4773223

      • Enzinger, E., & Morrison, G. S. (2012). The importance of using between-session test data in evaluating the performance of forensic-voice-comparison systems. Proceedings of the 14th Australasian International Conference on Speech Science and Technology, Sydney, Australia (pp. 137–140). Australasian Speech Science & Technology Association.

      • Enzinger, E., Zhang, C., & Morrison, G. S. (2012). Voice source features for forensic voice comparison – an evaluation of the GLOTTEX software package. Proceedings of Odyssey 2012: The Speaker and Language Recognition Workshop, Singapore (pp. 78–85).


    • Links to Dr Enzinger’s publications can be found at http://ewaldenzinger.entn.at/


    • Dr Enzinger’s current home base is Corvallis, Oregon, USA.



    Unless explicitly stated otherwise, any opinions expressed by Dr Enzinger are his own and do not necessarily represent the policies or opinions of any of the organisations or individuals with which he is currently or has previously been affiliated.





    Training

    Dr Morrison can provide training for forensic scientists, lawyers, judges, and others.

    Training can be specific to forensic speech science, but for most audiences it is not. It covers evalution of forensic evidence in general.

    Below are example materials from some of his workshops.

    Training will be tailored depending on the needs of the client.

    Training can be requested on any topic within his expertise.


    • Introduction to logical reasoning for the evaluation of forensic evidence

      • Slides:

      • Audience: forensic scientists and/or legal professionals


      • Abstract:

        In response to the 2009 US National Research Council (NRC) Report on Strengthening Forensic Science in the United States, the England & Wales Forensic Science Regulator’s 2014 Codes of Practice and Conduct, and the European Network of Forensic Science Institutes’ 2015 ENFSI Guideline for Evaluative Reporting, there is increasing pressure across all branches of forensic science to adopt a paradigm including the following key elements:

        • a logically correct framework for the evaluation and interpretation of forensic evidence

        • approaches based on relevant data, quantitative measurements, and statistical models

        • empirical testing of the degree of validity and reliability of forensic-evaluation systems under conditions reflecting those of the case under investigation

        • transparency as to decisions taken and procedures employed as part of forensic analyses

    This half-day workshop provides an introduction to the likelihood-ratio framework for the evaluation and interpretation of forensic evidence.

    There is a great deal of misunderstanding and confusion about the likelihood-ratio framework among lawyers, judges, and forensic scientists. The likelihood-ratio framework is about logic, not mathematics or databases, and it makes explicit the questions which must logically be addressed by the forensic scientist and considered by the trier of fact in assessing the work of the forensic scientist.

    This workshop explains the logic of the likelihood-ratio framework in a way which is accessible to a broad audience and which does not require any prior knowledge about the framework. It uses intuitive examples and audience-participation exercises to gradually build a fuller understanding of the likelihood-ratio framework.

    The workshop also includes discussion of common logical fallacies.

    Other workshops Dr Morrison presents generally assume familiarity with the material presented in this workshop.



    • Calculating the strength of forensic evidence, and testing the validity and reliability of forensic-evaluation systems

      • Audience: forensic scientists or legal professional (different depth of coverage and focus depending on the audience)

      • Abstract:

        This half-day workshop provides an introduction to the calculation of forensic likelihood ratios on the basis of relevant data, quantitative measurements, and statistical models, and an introduction to empirically assessing the validity and reliability of forensic-evaluation systems.

        Some of the topics listed below can also be presented as stand alone tutorials.

        Audience members are assumed to already have a basic understanding of the logic of the likelihood-ratio framework, e.g., by having participated in my workshop “Introduction to logical reasoning for the evaluation of forensic evidence”.

        Topics covered include:

        • selection of data which represent the relevant population and the conditions of the case under investigation

        • basic statistical models for calculating likelihood ratios

        • calibrating forensic-evaluation systems

          Slides:

          • TBA


        • empirically testing the degree of validity and reliability of forensic-evaluation systems

          Slides:


      • The workshop also includes discussion of elements of forensic analysis which must be made explicit and transparent to the trier of fact so that the trier of fact can understand the analysis and make decisions accordingly.




    Casework
    • We can conduct forensic analyses in forensic speech science:

      • forensic voice comparison

        • where the court wants to determine whether the voice of a speaker on an audio recording was produced by a particular known speaker or by some other speaker

        • We have conducted analyses, submitted reports, and testified in court on a number of occasions in the past.


      • disputed-utterance analysis

        • where the court wants to determine what a speaker said at some point on an audio recording

        • We have conducted research in this area, but have not yet been involved in a court case.


    • Our ability to perform forensic voice comparison is constrained by the availability of appropriate databases of speech recordings:

      • In some instances it may be necessary to collect data for the specific case. This may or may not be practical depending on the circumstances of the case.

      • In some instances we may be able to use preexisting databases. For example, we have a forensic database of recordings of 500+ Australian English speakers.


    • Our aim is to perform an analysis which would be deemed admissible in your jurisdiction, including under Rule 702 / Daubert in US Federal Court.


    • Dr Morrison can also provide critiques of reports submitted by others.

      • Forensic speech science is an area which has suffered from, and continues to suffer from, practitioners who are ignorant, deluded, or naïve, and perhaps even from outright charlatans and fraudsters.

      • Even relatively good forensic speech science reports and testimony may suffer from shortcomings.

      • Lawyers who are concerned about a forensic speech science report submitted by another expert should definitely contact us.

      • Example of a critique written for a journalistic case

      • In the past Dr Morrison has written critiques and given courtroom testimony in relation to written reports and courtroom testimony provided by a number of individuals, including (in alphabetical order):

        • Dr Helen Fraser
        • Dr Yuko Kinoshita
        • Mr Jonas Lindh
        • Mr Edward John Primeau
        • Dr Catriona Jean (aka Kate) Storey-Whyte

      • Dr Morrison’s experience in this area includes advising the defence in relation to a Daubert hearing on the admissibility of forensic voice comparison testimony tendered by the prosecution in a US Federal Court case.


    • Dr Morrison can provide testimony related to a listener’s abilities to recognise a speaker by the sound of their voice.

      • Sometimes instead of commissioning a forensic voice comparison report, a party in a court case may instead attempt to rely on a non expert, such as a police officer, saying that they recognised a speaker’s voice.

      • Research has identified a number of factors that may make listeners better or worse at identifying speakers.

      • One research finding is that people think that they and others are better at identifying speaker than they really are.

      • On several occasions in the past Dr Morrison has provided written reports and courtroom testimony on research findings and their potential relevance given the circumstances of the case.


    • Dr Morrison can also provide informational reports designed to educate the court about speaker recognition in general.

    • Whether commissioned by the prosecution or by the defence, and whether commissioned as potential testifying experts or as non-testifying experts, we will endeavour to provide an impartial assessment based on scientific criteria.




    Links


    Recommended publications

    The following publications should be accessible to a broad audience including lawyers and judges.



    Contact
    • An initial half-hour consultation is free.


    • Send me an e-mail with your contact information and I will call you via skype or telephone as soon as I can.

      • e-mail address:

        Internet Explorer users: To view the e-mail address set the security level to “medium-high” or below.



    • A note on location:

      • Sometimes we get enquiries asking if we can recommend anyone in the enquirer’s home town. Forensic speech science is a highly specialised field and there is not a competent expert in every town, or even in every country. Your goal should be to find the best expert to assist with your legal case (or training needs), and your search should be worldwide. We are based in Vancouver, British Columbia, Canada, and Corvallis, Oregon, United States of America, ideally located to serve North American clients. We also travel frequently to Europe and can provide service there as well as in other parts of the world. We have worked in Asia, Australasia, Europe, North America, and South America.


    • A note on value:

      • Sometimes we get enquiries where the first question the enquirer asks is “How much will it cost?” Again, forensic speech science is a highly specialised field, and done properly it is neither cheap not particularly fast. If the resources are not available to do the analysis properly, then we will not do the analysis. We understand that there are limits on budgets, and this effectively means that we only end up conducting analyses for legal cases where the charges are serious and the speech evidence is an important aspect of the case. Critiques require fewer resources, and can therefore be provided for a wider selection of cases. There are others out there who will offer cheaper and faster service. Always ask whether what they are offering is logically correct and whether they will empirically demonstrate the degree of validity and reliability of their analysis under conditions reflecting those of your case. If you have read the material above, and especially of you have read the recommended publications, you should have a reasonable idea of what this means. Understand what you are being offered, and then consider its value.