Predicting the microalgae lipid profile obtained by supercritical fluid extraction using a machine learning model

Front Chem. 2024 Oct 25:12:1480887. doi: 10.3389/fchem.2024.1480887. eCollection 2024.

Abstract

In this study a Machine Learning model was employed to predict the lipid profile from supercritical fluid extraction (SFE) of microalgae Galdieria sp. USBA-GBX-832 under different temperature (40, 50, 60°C), pressure (150, 250 bar), and ethanol flow (0.6, 0.9 mL min-1) conditions. Six machine learning regression models were trained using 33 independent variables: 29 from RD-Kit molecular descriptors, three from the extraction conditions, and the infinite dilution activity coefficient (IDAC). The lipidomic characterization analysis identified 139 features, annotating 89 lipids used as the entries of the model, primarily glycerophospholipids and glycerolipids. It was proposed a methodology for selecting the representative lipids from the lipidomic analysis using an unsupervised learning method, these results were compared with Tanimoto scores and IDAC calculations using COSMO-SAC-HB2 model. The models based on decision trees, particularly XGBoost, outperformed others (RMSE: 0.035, 0.095, 0.065 and coefficient of determination (R2): 0.971, 0.933, 0.946 for train, test and experimental validation, respectively), accurately predicting lipid profiles for unseen conditions. Machine Learning methods provide a cost-effective way to optimize SFE conditions and are applicable to other biological samples.

Keywords: COSMO-SAC; extremophile microalgae; lipidomic; regression models; supercritical fluid extraction.

Grants and funding

The author(s) declare that financial support was received for the research, authorship, and/or publication of this article. The resources of this project were provided by Sistema General de Regalías (SGR) Asignación para la Ciencia, Tecnología e Innovación. BPIN 2020000100356. Bogotá, 2019.