Background: Rheumatoid arthritis (RA) is a chronic systemic autoimmune disease characterized by inflammatory cell infiltration, which can lead to chronic disability, joint destruction and loss of function. At present, the pathogenesis of RA is still unclear. The purpose of this study is to explore the potential biomarkers and immune molecular mechanisms of rheumatoid arthritis through machine learning-assisted bioinformatics analysis, in order to provide reference for the early diagnosis and treatment of RA disease.
Methods: RA gene chips were screened from the public gene GEO database, and batch correction of different groups of RA gene chips was performed using Strawberry Perl. DEGs were obtained using the limma package of R software, and functional enrichment analysis such as gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), disease ontology (DO), and gene set (GSEA) were performed. Three machine learning methods, least absolute shrinkage and selection operator regression (LASSO), support vector machine recursive feature elimination (SVM-RFE) and random forest tree (Random Forest), were used to identify potential biomarkers of RA. The validation group data set was used to verify and further confirm its expression and diagnostic value. In addition, CIBERSORT algorithm was used to evaluate the infiltration of immune cells in RA and control samples, and the correlation between confirmed RA diagnostic biomarkers and immune cells was analyzed.
Results: Through feature screening, 79 key DEGs were obtained, mainly involving virus response, Parkinson's pathway, dermatitis and cell junction components. A total of 29 hub genes were screened by LASSO regression, 34 hub genes were screened by SVM-RFE, and 39 hub genes were screened by Random Forest. Combined with the three algorithms, a total of 12 hub genes were obtained. Through the expression and diagnostic value verification in the validation group data set, 7 genes that can be used as diagnostic biomarkers for RA were preliminarily confirmed. At the same time, the correlation analysis of immune cells found that γδT cells, CD4+ memory activated T cells, activated dendritic cells and other immune cells were positively correlated with multiple RA diagnostic biomarkers, CD4+ naive T cells, regulatory T cells and other immune cells were negatively correlated with multiple RA diagnostic biomarkers.
Conclusions: The results of novel characteristic gene analysis of RA showed that KYNU, EVI2A, CD52, C1QB, BATF, AIM2 and NDC80 had good diagnostic and clinical value for the diagnosis of RA, and were closely related to immune cells. Therefore, these seven DEGs may become new diagnostic markers and immunotherapy markers for RA.
Keywords: Bioinformatics; Biomarkers; Diagnostic genes; Immune cells; Machine learning; Rheumatoid arthritis.
© 2024 The Authors.