Deriving gradient measures of child speech from crowdsourced ratings

Tara McAllister Byun; Daphna Harel; Peter F Halpin; Daniel Szeredi

doi:10.1016/j.jcomdis.2016.07.001

Deriving gradient measures of child speech from crowdsourced ratings

J Commun Disord. 2016 Nov-Dec:64:91-102. doi: 10.1016/j.jcomdis.2016.07.001. Epub 2016 Jul 6.

Authors

Tara McAllister Byun¹, Daphna Harel², Peter F Halpin², Daniel Szeredi²

Affiliations

¹ New York University, New York, NY, USA. Electronic address: tara.byun@nyu.edu.
² New York University, New York, NY, USA.

Abstract

Recent research has demonstrated that perceptual ratings aggregated across multiple non-expert listeners can reveal gradient degrees of contrast between sounds that listeners might transcribe identically. Aggregated ratings have been found to correlate strongly with acoustic gold standard measures both when individual raters use a continuous rating scale such as visual analog scaling (Munson et al., 2012) and when individual raters provide binary ratings (McAllister Byun, Halpin, & Szeredi, 2015). In light of evidence that inexperienced listeners use continuous scales less consistently than experienced listeners, this study investigated the relative merits of binary versus continuous rating scales when aggregating responses over large numbers of naive listeners recruited through online crowdsourcing. Stimuli were words produced by children in treatment for misarticulation of North American English /r/. Each listener rated the same 40 tokens two times: once using Visual Analog Scaling (VAS) and once using a binary rating scale. The gradient rhoticity of each item was then estimated using (a) VAS click location, averaged across raters; (b) the proportion of raters who assigned the "correct /r/" label to each item in the binary rating task (pˆ). First, we validate these two measures of rhoticity against each other and against an acoustic gold standard. Second, we explore the range of variability in individual response patterns that underlie these group-level data. Third, we integrate statistical, theoretical, and practical considerations to offer guidelines for determining which measure to use in a given situation.

Keywords: Covert contrast; Crowdsourcing; Research methods; Speech perception; Speech rating; Speech sound disorders; Visual analog scaling.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Adult
Auditory Perception*
Crowdsourcing / methods*
Female
Humans
Male
Speech Acoustics*
Speech Perception*
Voice Quality

Grants and funding

R03 DC012883/DC/NIDCD NIH HHS/United States