BreCML: identifying breast cancer cell state in scRNA-seq via machine learning

Front Med (Lausanne). 2024 Nov 6:11:1482726. doi: 10.3389/fmed.2024.1482726. eCollection 2024.

Abstract

Breast cancer is a prevalent malignancy and one of the leading causes of cancer-related mortality among women worldwide. This disease typically manifests through the abnormal proliferation and dissemination of malignant cells within breast tissue. Current diagnostic and therapeutic strategies face significant challenges in accurately identifying and localizing specific subtypes of breast cancer. In this study, we developed a novel machine learning-based predictor, BreCML, designed to accurately classify subpopulations of breast cancer cells and their associated marker genes. BreCML exhibits outstanding predictive performance, achieving an accuracy of 98.92% on the training dataset. Utilizing the XGBoost algorithm, BreCML demonstrates superior accuracy (98.67%), precision (99.15%), recall (99.49%), and F1-score (99.79%) on the test dataset. Through the application of machine learning and feature selection techniques, BreCML successfully identified new key genes. This predictor not only serves as a powerful tool for assessing breast cancer cellular status but also offers a rapid and efficient means to uncover potential biomarkers, providing critical insights for precision medicine and therapeutic strategies.

Keywords: breast cancer; cell subpopulations; feature selection; machine learning; scRNA-seq.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. This work is being supported by the Key Discipline Construction Project of Pudong Health Bureau of Shanghai: Clinical Pharmacy (Grant No. PWZxk2022-27).