Prediction of Soil Pollution Risk Based on Machine Learning and SHAP Interpretable Models in the Nansi Lake, China
2025
Min Wang | Ruilin Zhang | Beibei Yan | Chengyuan Song | Yang Lv | Hengyi Zhao
To assess and predict the Nansi Lake soil pollution risk, we evaluate the soil environmental quality in the Nansi Lake region using machine learning techniques, combined with the SHapley Additive exPlanations (SHAP) model for interpretability. The primary objective was to predict the level of soil pollution caused by heavy metals, incorporating the traditional Pollution Load Index (PLI) and Potential Ecological Risk Index (PERI) methods. Through the integration of statistical characteristics, PLI, and PERI evaluations, a new assessment method was created, categorizing soil pollution into &ldquo:Class0&mdash:no risk&rdquo:, &ldquo:Class1&mdash:low risk&rdquo:, and &ldquo:Class2&mdash:high risk&rdquo:. Various machine learning models, including Support Vector Machine (SVM), Decision Tree Classifier (DT), Random Forest (RF), and XGBoost, were employed to predict the soil quality based on these indices. XGBoost demonstrated the highest accuracy, achieving a prediction accuracy of 93%. SHAP analysis was further applied to explain the machine learning model and determined that the accumulation of key pollutants such as cadmium (Cd) and mercury (Hg) may significantly produce soil pollution risk, and targeted management needs to be developed for these pollution features.
显示更多 [+] 显示较少 [-]