Background: Colonoscopy remains the predominant diagnostic modality for colorectal cancer (CRC), as the diagnostic performance of tumor markers in alone, particularly in the early stages of the disease, is limited. This study sought to develop a diagnostic model for CRC that integrated various laboratory parameters.
Methods: One hundred patients with CRC were assigned to an experimental group while 114 with benign colorectal diseases and 101 healthy individuals were assigned to a control group. The clinical and laboratory data, including the tumor markers such as carcinoembryonic antigen (CEA), glycan carbohydrate antigen 19-9 (CA19-9), carbohydrate antigen 242 (CA242), blood count parameters, blood biochemical parameters, and coagulation parameters, were collected for each participant. Three machine-learning models [multilayered perceptron (MLP), eXtreme Gradient Boosting (XGBoost), and random forest (RF)] were used to construct CRC diagnostic models. The performance of each model was evaluated based on its area under the curve (AUC), sensitivity, and specificity.
Results: There are 12 parameters: including CEA, CA19-9, CA242, absolute neutrophil value (NEUT), hemoglobin, the neutrophil/lymphocyte ratio, the platelet/lymphocyte ratio, alanine aminotransferase, alkaline phosphatase, aspartate aminotransferase, albumin, and prothrombin time, were selected to build the diagnostic model. For the validation set, the RF machine-learning model achieved the highest performance in identifying CRC [AUC: 0.902 (95% confidence interval: 0.812-0.989), accuracy: 0.803, sensitivity: 0.908, specificity: 0.772, positive predictive value: 0.664, negative predictive value: 0.890, and F1 score: 0.763]. The AUC, sensitivity, specificity, and Youden's index for the combined diagnosis of tumor markers CEA, CA19-9, and CA242 were 0.761, 0.486, 0.983, and 0.469, respectively. The RF diagnostic model showed better diagnostic efficacy than the combined diagnosis model of tumor markers CEA, CA19-9 and CA242.
Conclusions: The use of machine learning combined with multiple laboratory parameters effectively improved the diagnostic efficiency of CRC and provided more accurate results for clinical diagnosis.
Keywords: Colorectal cancer (CRC); diagnostic model; machine learning; tumor markers.
2024 AME Publishing Company. All rights reserved.