Blinded listener ratings are essential for valid assessment of interventions for speech disorders, but collecting these ratings can be time-intensive and costly. This study evaluated the validity of speech ratings obtained through online crowdsourcing, a potentially more efficient approach. 100 words from children with /r/ misarticulation were electronically presented for binary rating by 35 phonetically trained listeners and 205 naïve listeners recruited through the Amazon Mechanical Turk (AMT) crowdsourcing platform. Bootstrapping was used to compare different-sized samples of AMT listeners against a "gold standard" (mode across all trained listeners) and an "industry standard" (mode across bootstrapped samples of three trained listeners). There was strong overall agreement between trained and AMT listeners. The "industry standard" level of performance was matched by bootstrapped samples with n = 9 AMT listeners. These results support the hypothesis that valid ratings of speech data can be obtained in an efficient manner through AMT. Researchers in communication disorders could benefit from increased awareness of this method.
Learning outcomes: Readers will be able to (a) discuss advantages and disadvantages of data collection through the crowdsourcing platform Amazon Mechanical Turk (AMT), (b) describe the results of a validity study comparing samples of AMT listeners versus phonetically trained listeners in a speech-rating task.
Keywords: Crowdsourcing; Research methods; Speech perception; Speech rating; Speech sound disorders.
Copyright © 2014 Elsevier Inc. All rights reserved.