Accoustic and perceptual characteristics of English fricatives

The broad objective of this project is to gain a more thorough understanding of the production and perception of English fricatives. A first detailed report on the acoustic properties of fricatives was published as Jongman, Wayland, and Wong (2000). In that study, two recent metrics for classifying place of articulation in fricatives - spectral moments and locus equations were investigated. While these metrics have successfully been applied to stop consonants, virtually no work had extended these approaches to fricatives. With appropriate modifications, these metrics seemed particularly promising for fricative classification. Performance of these global metrics was compared to a local metric - spectral peak location. Other properties such as noise duration, noise amplitude, and relative amplitude were also evaluated in terms of their utility in classifying fricative place of articulation.

In addition, perception experiments using computer-edited natural speech were conducted to help evaluate the accuracy of classification metrics and to investigate the 'psychological reality' of these metrics. A first report on the perceptual results as well as attempts to systematically relate listeners’ judgments to acoustic measurements can be found in Jongman (2001), which was presented at a workshop on Speech Recognition as Pattern Classification, at the Max Planck Institute for Psycholinguistics in July 2001. More recently, in collaboration with Dr. Bob McMurray (University of Iowa), the corpus of acoustic measurements and their perceptual evaluations have been used to assess three competing theories of speech perception: invariance, exemplar, and parsing theories. Using multinomial logistic regression, we ask how useful the information in the signal could be, given the assumptions made by these theoretical approaches about how it should be treated. Initial findings were first presented at the ASA in Miami (2008) and the Computational Modeling of Sound Pattern Acquisition workshop at the University of Alberta, Edmonton, Canada (2010). A full manuscript is currently under revision.

We have also used fricatives as a case study to study the acoustics and perception of clear speech. Most work on clear speech has involved vowels and there are very few detailed studies of clearly produced consonants. The acoustics of clearly (and conversationally) produced fricatives are described in Maniwa, Jongman, and Wade (2009). In addition to documenting a number of consistent modifications in terms of spectrum, amplitude, and duration, this study also shows that speakers make specific changes in their productions as a function of the nature of the recognition errors (e.g., voicing or place of articulation) that prompted the productions. Perception of clear fricatives by normal hearing and simulated hearing-impaired listeners is described in Maniwa, Jongman, and Wade (2008). Results indicate that clear speech helped both groups overall. However, for impaired listeners, reliable clear speech intelligibility advantages were not found for non-sibilant pairs.

Finally, we have also explored the separate contribution of auditory, visual, audio-visual, and contextual linguistic information to the perception of /f, v, th, dh/, sounds which, it is often claimed, rely primarily on non-acoustic properties. Results indicated that perception of these fricatives is as good based on visual information alone as on both auditory and visual information combined, and better than on the basis of auditory information alone. These findings suggest that accurate perception of non-sibilant fricatives derives from a combination of acoustic, linguistic, and visual information (Jongman, Wang, and Kim, 2003).

In sum, by relating acoustic and perceptual data, and by comparing the role of auditory, contextual, and visual information, this research thus aims at a comprehensive account of the acoustic and perceptual properties of English fricatives.

[Research partially supported by NIH]

