Acquisition and processing of lexical tone in Mandarin Chinese

This project investigates (Mandarin) tone from a variety of perspectives. First, we have established that perception of Mandarin tone by non-native listeners can be greatly improved with a brief training procedure following the high-variability paradigm. Specifically, American listeners’ perception improved by 21% following 8 training sessions of 40 minutes each. Moreover, this improvement generalized to new speakers and contexts (Wang, Spence, Jongman, and Sereno, 1999, JASA) and was retained when probed 6 months after training. Further studies have now been carried out to investigate whether the tone contrasts gained perceptually can be transferred to production. Before their perception pretest and after their posttest, the same trainees were recorded producing a list of Mandarin words. Their productions were first judged by native Mandarin listeners in an identification task. Trainees’ posttest tone productions were 18% more accurately identified than their pretest productions, indicating significant tone production improvement after perceptual training. Acoustic analyses of the pre- and post-training productions further reveal the nature of the improvement, showing that post-training tone contours approximate native norms to a greater degree than pre-training tone contours (Wang, Jongman, and Sereno, 2003, JASA).

Second, we (in collaboration with Dr. Joy Hirsch, Columbia Medical School) have used functional magnetic resonance imaging (fMRI) to investigate cortical changes resulting from tone training. For the native Mandarin speaker, significant activation was found only in the language-specific regions, Broca's and Wernicke's areas. For the non-natives, bilateral activation was generally observed for both pre- and post-training images, and additional areas are activated after training, especially in the right hemisphere in the region equivalent to Broca's area. (Wang, Sereno, Jongman, and Hirsch, 2003, J. Cogn. NeuroScience).

We have also studied hemispheric lateralization of tone using a dichotic listening paradigm. The processing of linguistic tone is of particular interest due to its dual nature: on the one hand, tones are used to signal linguistic distinctions, purportedly the domain of the left hemisphere. On the other hand, tones involve a local modulation of F0, generally found to be the domain of the right hemisphere. Our results suggest that for Chinese listeners, tone perception is a function of the left hemisphere while for American listeners, both hemispheres are equally engaged (Wang, Jongman, and Sereno, 2001, Brain and Language). Follow-up experiments with native Mandarin listeners, English-Mandarin bilinguals, Norwegian listeners with experience with Norwegian tone, and American listeners with no tone experience indicate left-hemisphere dominance of Mandarin tone by native and proficient bilingual listeners, whereas non-native listeners show no evidence of lateralization, regardless of their familiarity with lexical tone (Wang, Behne, Jongman, and Sereno, 2004, Appl. Psycholing.).

More recently, we used a direct priming task in order to investigate the nature and processing of tonal information in spoken word recognition of Chinese. Two experiments were conducted. In Experiment 1, prime-target pairs contrasted in terms of tonal and segmental overlap. Experiment 1 replicated the first experiment of C-Y. Lee’s (2007) study but with a significant modification that balanced tonal information in prime-target pairs. Forty-eight monosyllabic Mandarin target words were paired with four types of primes in which prime and target were identical (e.g., bo1– bo1), shared only segmental information (e.g., bo1 –bo2), shared only tonal information (e.g., bo1 –zhua1) or were unrelated (e.g., bo1 –man3). Experiment 2 extended the prime-target paradigm to include minimal segmental overlap in onset and in offset portion. Forty-eight monosyllabic Mandarin target words were paired with four types of primes in which prime and target were identical (e.g., bo1– bo1), shared tonal and only onset segmental information (e.g., bo1 –bin1), shared tonal and only offset segmental information (e.g., bo1 –po1) or were unrelated (e.g., bo1 –man3). The results of Experiment 1 showed that the facilitation effect was found when the prime-target pairs were identical or segmental structure overlapped compared to conditions where the prime-target pairs only overlapped in tone or were unrelated. Effects of similarity of tone across prime-target segmental pairs were also analyzed. The results of Experiment 2 showed that the facilitation effect was only found when the prime-target pairs were identical. Partial segmental overlap in conjunction with tone resulted in inhibition compared to an unrelated control. Together, these data indicate that segmental information can facilitate word recognition, with segmental information carrying more weight than tonal information in the processing of spoken Chinese (J.Y. Lee, Sereno, Zhang, and Jongman, poster presented at the 12th International Conference on the Processing of East Asia Related Languages (PEARL2007), Taiwan).

We have also explored the perturbation effect on vowel fundamental frequency (F0) by voiceless aspirated and unaspirated prevocalic obstruents in a tone language (Taiwanese) (Lai, Huff, Sereno, and Jongman, 2009; a longer version is in preparation). It is well known that F0 is significantly higher after voiceless than voiced stops. However, the perturbation effect caused by aspiration has received much less attention. The recorded syllables were 30 consonant-vowel (CV(N)) word pairs, contrasting in aspiration (voiceless aspirated and voiceless unaspirated) of initial stop consonants. The syllables consisted of tokens from three places of articulation (bilabial, alveolar, and velar) with the five full tones in Taiwanese: Tone 1 - High Level (HL), Tone 2 - High Falling (HF), Tone 3 - Low Level (LL), Tone 5 - Low Rising (LR), Tone 7 - Low Falling (LF). 10 adult native speakers of Taiwanese, 5 females and 5 males, were recruited from Taipei, Taiwan.

Results demonstrate that the F0 of tones following aspirated stops is higher than tones following unaspirated stops, corresponding to the findings of Lai (2004).  The mean tone duration for all tokens was 465ms, with a general tapering off of the raising effect occurring at around 140ms, or about 30% of the way through the tonal contour.  A significant interaction was found between aspiration and gender in which, though in general both genders exhibited a trend for higher F0 following aspirated stops, only female values were significantly higher. Given previous research, we expected to find differences in the magnitude of the aspiration perturbation effect by tone height (Hombert and Ladefoged, 1977); however no significant differences were observed.

Results from VOT measurements demonstrate a clear, significant effect in terms of aspiration, with aspirated stops having a mean 90ms VOT, in comparison to 19ms in unaspirated stops.  Additionally, velar stops demonstrated significantly longer VOT values (68ms) than either bilabials (47ms) or alveolars (48ms).  The effect of tone on VOT was found to be significant, with tone 2 having significantly shorter VOT (48ms) than tone 3 (56ms), tone 5 (56ms), and tone 7 (58ms), but not tone 1 (52ms).

Dr. Sereno is currently P.I. on an NSF-funded project “The contribution of speaking rate, talker and coarticulation to lexical tone processing: Effects of language background and training.”


One of 34 U.S. public institutions in the prestigious Association of American Universities
Nearly $290 million in financial aid annually
44 nationally ranked graduate programs.
—U.S. News & World Report
Top 50 nationwide for size of library collection.
—ALA
23rd nationwide for service to veterans —"Best for Vets," Military Times
KU Today