Application of random forest regression and comparison of its performance to multiple linear regression in modeling groundwater nitrate concentration at the African continent scale | Application de la méthode de régression dite des forêts aléatoires et comparaison de ses performances avec la régression linéaire multiple pour la modélisation de la concentration en nitrates des eaux souterraines à l’échelle du continent africain Aplicación de la regresión de bosques aleatorios y comparación de su desempeño con la regresión lineal múltiple en el modelado de la concentración de nitrato de agua subterránea a escala del continente africano 在模拟非洲大陆尺度上地下水硝酸盐含量中随机预测回归分析的应用及其针对多重线性回归性能的比较 Aplicação de regressão de floresta aleatória e comparação de seu desempenho com a regressão linear múltipla na modelagem da concentração de nitrato de águas subterrâneas na escala do continente Africano
2019
Ouedraogo, Issoufou | Defourny, Pierre | Vanclooster, Marnik
Groundwater management decisions require robust methods that allow accurate predictive modeling of pollutant occurrences. In this study, random forest regression (RFR) was used for modeling groundwater nitrate contamination at the African continent scale. When compared to more conventional techniques, key advantages of RFR include its nonparametric nature, its high predictive accuracy, and its capability to determine variable importance. The latter can be used to better understand the individual role and the combined effect of explanatory variables in a predictive model. In the absence of a systematic groundwater monitoring program at the African continent scale, the study used the groundwater nitrate contamination database for the continent obtained from a meta-analysis to test the modeling approach; 250 groundwater nitrate pollution studies from the African continent were compiled using the literature data. A geographic information system database of 13 spatial attributes was collected, related to land use, soil type, hydrogeology, topography, climatology, type of region, and nitrogen fertilizer application rate, and these were assigned as predictors. The RFR performance was evaluated in comparison to the multiple linear regression (MLR) methods. By using RFR, it was possible to establish which explanatory variables influence the occurrence of nitrate pollution in groundwater (population density, rainfall, recharge, etc.). Both the RFR and MLR techniques identified population density as the most important variable explaining reported nitrate contamination. However, RFR has a much higher predictive power (R² = 0.97) than a traditional linear regression model (R² = 0.64). RFR is therefore considered a very promising technique for large-scale modeling of groundwater nitrate pollution.
Показать больше [+] Меньше [-]Ключевые слова АГРОВОК
Библиографическая информация
Эту запись предоставил National Agricultural Library