Machine learning-based COVID-19 diagnosis by demographic characteristics and clinical data

Adv Respir Med. 2022 Feb 1. doi: 10.5603/ARM.a2022.0021. Online ahead of print.

Abstract

Introduction: To facilitate rapid and effective diagnosis of COVID-19, effective screening can alleviate the challenges facing healthcare systems. We aimed to develop a machine learning-based prediction of COVID-19 diagnosis and design a graphical user interface (GUI) to diagnose COVID-19 cases by recording their symptoms and demographic features.

Methods: We implemented different classification models including support vector machine (SVM), Decision tree (DT), Naïve Bayes (NB) and K-nearest neighbor (KNN) to predict the result of COVID-19 test for individuals. We trained these models by data of 16973 individuals (90% of all individuals included in data gathering) and tested by 1885 individuals (10% of all individuals). Maximum relevance minimum redundancy (MRMR) algorithms used to score features for prediction of result of COVID-19 test. A user-friendly GUI was designed to predict COVID-19 test results in individuals.

Results: Study results revealed that coughing had the highest positive correlation with the positive results of COVID-19 test followed by the duration of having COVID-19 signs and symptoms, exposure to infected individuals, age, muscle pain, recent infection by COVID-19 virus, fever, respiratory distress, loss of smell or taste, nausea, anorexia, headache, vertigo, CT symptoms in lung scans, diabetes and hypertension. The values of accuracy, precision, recall, F1-score, specificity and area under receiver operating curve (AUROC) of different classification models computed in different setting of features scored by MRMR algorithm. Finally, our designed GUI by receiving each of the 42 features and symptoms from the users and through selecting one of the SVM, KNN, Naïve Bayes and decision tree models, predict the result of COVID-19 test. The accuracy, AUROC and F1-score of SVM model as the best model for diagnosis of COVID-19 test were 0.7048 (95% CI: 0.6998, 0.7094), 0.7045 (95% CI: 0.7003, 0.7104) and 0.7157 (95% CI: 0.7043, 0.7194), respectively.

Conclusion: In this study we implemented a machine learning approach to facilitate early clinical decision making during COVID-19 outbreak and provide a predictive model of COVID-19 diagnosis capable of categorizing populations in to infected and non-infected individuals the same as an efficient screening tool.

Keywords: COVID-19; clinical features; demographic characteristics; diagnosis; machine learning.