Smoking Classification Using Novel Plasma Cytokines by implementing Machine Learning and Statistical Methods

Proc (Int Conf Comput Sci Comput Intell). 2023 Dec:2023:686-694. doi: 10.1109/csci62032.2023.00118. Epub 2024 Jul 19.

Abstract

Smoking is a major cause of premature and preventable death. Tobacco exposure has a detrimental effect on many organs and contributes to multiple diseases including chronic obstructive pulmonary disease (COPD), cardiovascular disease, cancer, and diabetes. Cytokines are inflammatory biomarkers that are mechanistically associated with smoking. Machine Learning algorithms allow for the quantitative assessment of the contributions of individual cytokines to tobacco-related diseases. The mapping of cytokines to disease can facilitate and direct treatment modalities. By the application of k Nearest Neighbor (k-NN) and Random Forest machine learning algorithms on 63 plasma cytokines we have demonstrated the classification of smoking. To ensure optimal results, performance improvement techniques such as k-fold cross validation and hyper parameter tuning are employed. Separability efficiency achieved by the models is evaluated using the Area Under the Receiver Operating Characteristic (AUROC) metric. The most significant cytokines that enabled the classification are identified and presented. The statistically significant difference for AUROC score of k-NN and Random Forest has been ascertained using the 2-sample independent t test. A reasonably good classification performance was achieved by k-NN algorithm with an AUROC metric of .87, and a 95% CI of (.823,.917). Random forest exceeded k-NN algorithm's performance, with a perfect AUROC score of 1 and a 95% CI of (1,1). From among the ten most prominent cytokines that contributed to the classification, the ones common to both algorithms are: LIF, IL22, G-CSF/CSF-3, TRAIL. AUROC scores for k-NN and Random Forest are significantly different (p-value = 5.105e-16). The discovery and transference of biomarkers such as cytokines from the platform of molecular investigation to clinical practice, can facilitate precision medicine-based therapeutic interventions.

Keywords: AUROC; Classification; Plasma cytokines; Random Forest; k-NN.