The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers

John F Magnotti; Anastasia Lado; Michael S Beauchamp

doi:10.3389/fnins.2024.1421713

The noisy encoding of disparity model predicts perception of the McGurk effect in native Japanese speakers

Front Neurosci. 2024 Jun 26:18:1421713. doi: 10.3389/fnins.2024.1421713. eCollection 2024.

Authors

John F Magnotti¹, Anastasia Lado¹, Michael S Beauchamp¹

Affiliation

¹ Department of Neurosurgery, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, United States.

Abstract

In the McGurk effect, visual speech from the face of the talker alters the perception of auditory speech. The diversity of human languages has prompted many intercultural studies of the effect in both Western and non-Western cultures, including native Japanese speakers. Studies of large samples of native English speakers have shown that the McGurk effect is characterized by high variability in the susceptibility of different individuals to the illusion and in the strength of different experimental stimuli to induce the illusion. The noisy encoding of disparity (NED) model of the McGurk effect uses principles from Bayesian causal inference to account for this variability, separately estimating the susceptibility and sensory noise for each individual and the strength of each stimulus. To determine whether variation in McGurk perception is similar between Western and non-Western cultures, we applied the NED model to data collected from 80 native Japanese-speaking participants. Fifteen different McGurk stimuli that varied in syllable content (unvoiced auditory "pa" + visual "ka" or voiced auditory "ba" + visual "ga") were presented interleaved with audiovisual congruent stimuli. The McGurk effect was highly variable across stimuli and participants, with the percentage of illusory fusion responses ranging from 3 to 78% across stimuli and from 0 to 91% across participants. Despite this variability, the NED model accurately predicted perception, predicting fusion rates for individual stimuli with 2.1% error and for individual participants with 2.4% error. Stimuli containing the unvoiced pa/ka pairing evoked more fusion responses than the voiced ba/ga pairing. Model estimates of sensory noise were correlated with participant age, with greater sensory noise in older participants. The NED model of the McGurk effect offers a principled way to account for individual and stimulus differences when examining the McGurk effect in different cultures.

Keywords: audiovisual; cross-cultural; illusion; multisensory; speech.

Grants and funding

The author(s) declare financial support was received for the research, authorship, and/or publication of this article. This research was supported by NIH R01NS065395 and U01NS113339. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.