Brain-to-text: decoding spoken phrases from phone representations in the brain

Christian Herff; Dominic Heger; Adriana de Pesters; Dominic Telaar; Peter Brunner; Gerwin Schalk; Tanja Schultz

doi:10.3389/fnins.2015.00217

Brain-to-text: decoding spoken phrases from phone representations in the brain

Front Neurosci. 2015 Jun 12:9:217. doi: 10.3389/fnins.2015.00217. eCollection 2015.

Authors

Christian Herff¹, Dominic Heger¹, Adriana de Pesters², Dominic Telaar¹, Peter Brunner³, Gerwin Schalk⁴, Tanja Schultz¹

Affiliations

¹ Cognitive Systems Lab, Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology Karlsruhe, Germany.
² New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA.
³ New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Neurology, Albany Medical College Albany, NY, USA.
⁴ New York State Department of Health, National Center for Adaptive Neurotechnologies, Wadsworth Center Albany, NY, USA ; Department of Biomedical Sciences, State University of New York at Albany Albany, NY, USA ; Department of Neurology, Albany Medical College Albany, NY, USA.

Abstract

It has long been speculated whether communication between humans and machines based on natural speech related cortical activity is possible. Over the past decade, studies have suggested that it is feasible to recognize isolated aspects of speech from neural signals, such as auditory features, phones or one of a few isolated words. However, until now it remained an unsolved challenge to decode continuously spoken speech from the neural substrate associated with speech and language processing. Here, we show for the first time that continuously spoken speech can be decoded into the expressed words from intracranial electrocorticographic (ECoG) recordings.Specifically, we implemented a system, which we call Brain-To-Text that models single phones, employs techniques from automatic speech recognition (ASR), and thereby transforms brain activity while speaking into the corresponding textual representation. Our results demonstrate that our system can achieve word error rates as low as 25% and phone error rates below 50%. Additionally, our approach contributes to the current understanding of the neural basis of continuous speech production by identifying those cortical regions that hold substantial information about individual phones. In conclusion, the Brain-To-Text system described in this paper represents an important step toward human-machine communication based on imagined speech.

Keywords: ECoG; automatic speech recognition; brain-computer interface; broadband gamma; electrocorticography; pattern recognition; speech decoding; speech production.

Grants and funding

P41 EB018783/EB/NIBIB NIH HHS/United States