Applications of geographically weighted machine learning models for predicting soil heavy metal concentrations across mining sites

Sci Total Environ. 2024 Nov 21:177667. doi: 10.1016/j.scitotenv.2024.177667. Online ahead of print.

Abstract

The accurate prediction of soil heavy metal contamination is crucial for the effective environmental management of abandoned mining areas. However, conventional machine learning models (CMLMs) often fail to account for the spatial heterogeneity of soil contamination, which limits their predictive accuracy. This study evaluated the performance of geographically weighted machine learning models (GWMLMs) in predicting soil Cd and Pb concentrations in abandoned mines in the Republic of Korea. We compared two GWMLMs (Geographically Weighted Random Forest and Geographically Weighted Extreme Gradient Boosting) with four CMLMs (Random Forest, Gradient Boosting, Light Gradient Boosting, and extreme Gradient Boosting). The data used in this study included soil samples from six abandoned mining sites with various geographical and soil input variables. The results showed that the GWMLMs consistently outperformed the CMLMs in predicting heavy metal contamination. For Cd predictions, GWMLMs exhibited on average 0.02 lower root mean square error and mean absolute error values, with a 0.26 increase in R2 values compared to CMLMs. Similarly, for Pb predictions, the GWMLMs showed 0.18 and 0.13 lower root mean square error and mean absolute error values, respectively, and a 0.17 increase in R2 relative to the CMLMs. The findings demonstrate the usefulness of GWMLMs for predicting the spatial distribution of soil heavy metals. SHapley Additive exPlanations analysis exhibited elevation and distance from abandoned mining sites as the most influential factors in predicting both Cd and Pb concentrations. This study highlights the value of GWMLMs that incorporate spatial heterogeneity into CMLMs for enhancing prediction accuracy and providing crucial insights for environmental management in mining-impacted regions.

Keywords: Conventional machine learning model (CMLM); Geographically weighted machine learning model (GWMLM); Soil heavy metal; Spatial heterogeneity.