Background: Coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), emerged in Wuhan, China, in late 2019 and created a global pandemic that overwhelmed healthcare systems. COVID-19, as of July 3, 2021, yielded 182 million confirmed cases and 3.9 million deaths globally according to the World Health Organization. Several patients who were initially diagnosed with mild or moderate COVID-19 later deteriorated and were reclassified to severe disease type.
Objective: The aim is to create a predictive model for COVID-19 ventilatory support and mortality early on from baseline (at the time of diagnosis) and routinely collected data of each patient (CXR, CBC, demographics, and patient history).
Methods: Four common machine learning algorithms, three data balancing techniques, and feature selection are used to build and validate predictive models for COVID-19 mechanical requirement and mortality. Baseline CXR, CBC, demographic, and clinical data were retrospectively collected from April 2, 2020, till June 18, 2020, for 5739 patients with confirmed PCR COVID-19 at King Abdulaziz Medical City in Riyadh. However, of those patients, only 1508 and 1513 have met the inclusion criteria for ventilatory support and mortalilty endpoints, respectively.
Results: In an independent test set, ventilation requirement predictive model with top 20 features selected with reliefF algorithm from baseline radiological, laboratory, and clinical data using support vector machines and random undersampling technique attained an AUC of 0.87 and a balanced accuracy of 0.81. For mortality endpoint, the top model yielded an AUC of 0.83 and a balanced accuracy of 0.80 using all features with balanced random forest. This indicates that with only routinely collected data our models can predict the outcome with good performance. The predictive ability of combined data consistently outperformed each data set individually for intubation and mortality. For the ventilator support, chest X-ray severity annotations alone performed better than comorbidity, complete blood count, age, or gender with an AUC of 0.85 and balanced accuracy of 0.79. For mortality, comorbidity alone achieved an AUC of 0.80 and a balanced accuracy of 0.72, which is higher than models that use either chest radiograph, laboratory, or demographic features only.
Conclusion: The experimental results demonstrate the practicality of the proposed COVID-19 predictive tool for hospital resource planning and patients' prioritization in the current COVID-19 pandemic crisis.
Keywords: CBC; COVID-19; NIV; SMOTE; machine learning; X-rays; mortality; random forest.
© 2021 Aljouie et al.