Background: Detecting and analyzing Alzheimer's disease (AD) in its early stages is a crucial and significant challenge. Speech data from AD patients can aid in diagnosing AD since the speech features have common patterns independent of race and spoken language. However, previous models for diagnosing AD from speech data have often focused on the characteristics of a single language, with no guarantee of scalability to other languages. In this study, we used the same method to extract acoustic features from two language datasets to diagnose AD.
Methods: Using the Korean and English speech datasets, we used ten models capable of real-time AD and healthy control classification, regardless of language type. Four machine learning models were based on hand-crafted features, while the remaining six deep learning models utilized non-explainable features.
Results: The highest accuracy achieved by the machine learning models was 0.73 and 0.69 for the Korean and English speech datasets, respectively. The deep learning models' maximum achievable accuracy reached 0.75 and 0.78, with their minimum classification time of 0.01s and 0.02s. These findings reveal the models' robustness regardless of Korean and English and real-time diagnosis of AD through a 30-s voice sample.
Conclusion: Non-explainable deep learning models that directly acquire voice representations surpassed machine learning models utilizing hand-crafted features in AD diagnosis. In addition, these AI models could confirm the possibility of extending to a language-agnostic AD diagnosis.
Keywords: Alzheimer's disease; Deep learning; Hand-crafted features; Language-agnostic; Non-explainable features.
Copyright © 2024 The Authors. Published by Elsevier Ltd.. All rights reserved.