Refine search
Results 1-10 of 283
Physics-informed machine learning algorithms for forecasting sediment yield: an analysis of physical consistency, sensitivity, and interpretability
2024
El Bilali, A. | Brouziyne, Youssef | Attar, O. | Lamane, H. | Hadri, A. | Taleb, A.
The sediment transport, involving the movement of the bedload and suspended sediment in the basins, is a critical environmental concern that worsens water scarcity and leads to degradation of land and its ecosystems. Machine learning (ML) algorithms have emerged as powerful tools for predicting sediment yield. However, their use by decision-makers can be attributed to concerns regarding their consistency with the involved physical processes. In light of this issue, this study aims to develop a physics-informed ML approach for predicting sediment yield. To achieve this objective, Gaussian, Center, Regular, and Direct Copulas were employed to generate virtual combinations of physical of the sub-basins and hydrological datasets. These datasets were then utilized to train deep neural network (DNN), conventional neural network (CNN), Extra Tree, and XGBoost (XGB) models. The performance of these models was compared with the modified universal soil loss equation (MUSLE), which serves as a process-based model. The results demonstrated that the ML models outperformed the MUSLE model, exhibiting improvements in Nash–Sutcliffe efficiency (NSE) of approximately 10%, 18%, 32%, and 41% for the DNN, CNN, Extra Tree, and XGB models, respectively. Furthermore, through Sobol sensitivity and Shapley additive explanation–based interpretability analyses, it was revealed that the Extra Tree model displayed greater consistency with the physical processes underlying sediment transport as modeled by MUSLE. The proposed framework provides new insights into enhancing the accuracy and applicability of ML models in forecasting sediment yield while maintaining consistency with natural processes. Consequently, it can prove valuable in simulating process-related strategies aimed at mitigating sediment transport at watershed scales, such as the implementation of best management practices.
Show more [+] Less [-]Dynamic model to predict the association between air quality, COVID-19 cases, and level of lockdown
2021
Tadano, Yara S. | Potgieter-Vermaak, Sanja | Kachba, Yslene R. | Chiroli, Daiane M.G. | Casacio, Luciana | Santos-Silva, Jéssica C. | Moreira, Camila A.B. | Machado, Vivian | Alves, Thiago Antonini | Siqueira, Hugo | Godoi, Ricardo H.M.
Studies have reported significant reductions in air pollutant levels due to the COVID-19 outbreak worldwide global lockdowns. Nevertheless, all of the reports are limited compared to data from the same period over the past few years, providing mainly an overview of past events, with no future predictions. Lockdown level can be directly related to the number of new COVID-19 cases, air pollution, and economic restriction. As lockdown status varies considerably across the globe, there is a window for mega-cities to determine the optimum lockdown flexibility. To that end, firstly, we employed four different Artificial Neural Networks (ANN) to examine the compatibility to the original levels of CO, O₃, NO₂, NO, PM₂.₅, and PM₁₀, for São Paulo City, the current Pandemic epicenter in South America. After checking compatibility, we simulated four hypothetical scenarios: 10%, 30%, 70%, and 90% lockdown to predict air pollution levels. To our knowledge, ANN have not been applied to air pollution prediction by lockdown level. Using a limited database, the Multilayer Perceptron neural network has proven to be robust (with Mean Absolute Percentage Error ∼ 30%), with acceptable predictive power to estimate air pollution changes. We illustrate that air pollutant levels can effectively be controlled and predicted when flexible lockdown measures are implemented. The models will be a useful tool for governments to manage the delicate balance among lockdown, number of COVID-19 cases, and air pollution.
Show more [+] Less [-]Spatial patterning of chlorophyll a and water-quality measurements for determining environmental thresholds for local eutrophication in the Nakdong River basin
2021
Kim, Hyo Gyeom | Hong, Sungwon | Chon, Tae Soo | Joo, Gea-Jae
Management of water-quality in a river ecosystem needs to be focused on susceptible regions to eutrophication based on proper measurements. The stress–response relationships between nutrients and primary productivity of phytoplankton allow the derivation of ecologically acceptable thresholds of stressors under field conditions. However, spatio-temporal variations in heterogeneous environmental conditions have hindered the development of locally applicable criteria. To address these issues, we utilized a combination of a geographically specialized artificial neural network (Geo-SOM, geo-self-organizing map) and linear mixed-effect models (LMMs). The model was applied to a 24-month dataset of 54 stations that spanned a wide spatial gradient in the Nakdong River basin. The Geo-SOM classified 1286 observations in the basin into 13 clusters that were regionally and seasonally distinct. Inclusion of the random effects of Geo-SOM clustering improved the performance of each LMM, which suggests that there were significant spatio-temporal variations in the Chla–stressor relationships. These variations arise owing to differences in background seasonality and the effects of local pollutant variables and land-use patterns. Among the 16 environmental variables, the major stressors for Chla were total phosphate (TP) as a nutrient and biological oxygen demand (BOD) as a non-nutrient according to the results of both Geo-SOM and LMM analyses. Based on LMMs with the random effect of the Geo-SOM clusters on the intercept and the slope, we can propose recommended thresholds for TP (18.5 μg L⁻¹) and BOD (1.6 mg L⁻¹) in the Nakdong River. The combined method of LMM and Geo-SOM will be useful in guiding appropriate local water-quality-management strategies and in the global development of large-scale nutrient criteria.
Show more [+] Less [-]Estimate hourly PM2.5 concentrations from Himawari-8 TOA reflectance directly using geo-intelligent long short-term memory network
2021
Wang, Bin | Yuan, Qiangqiang | Yang, Qian | Zhu, Liye | Li, Tongwen | Zhang, Liangpei
Fine particulate matter (PM₂.₅) has attracted extensive attention because of its baneful influence on human health and the environment. However, the sparse distribution of PM₂.₅ measuring stations limits its application to public utility and scientific research, which can be remedied by satellite observations. Therefore, we developed a Geo-intelligent long short-term network (Geoi-LSTM) to estimate hourly ground-level PM₂.₅ concentrations in 2017 in Wuhan Urban Agglomeration (WUA). We conducted contrast experiments to verify the effectiveness of our model and explored the optimal modeling strategy. It turned out that Geoi-LSTM with TOA reflectance, meteorological conditions, and NDVI as inputs performs best. The station-based cross-validation R², root mean squared error and mean absolute error are 0.82, 15.44 μg/m³, 10.63 μg/m³, respectively. Based on model results, we revealed spatiotemporal characteristics of PM₂.₅ in WUA. Generally speaking, during the day, PM₂.₅ concentration remained stable at a relatively high level in the morning and decreased continuously in the afternoon. While during the year, PM₂.₅ concentrations were highest in winter, lowest in summer, and in-between in spring and autumn. Combined with meteorological conditions, we further analyzed the whole process of a PM₂.₅ pollution event. Finally, we discussed the loss in removing clouds-covered pixels and compared our model with several popular models. Overall, our results can reflect hourly PM₂.₅ concentrations seamlessly and accurately with a spatial resolution of 5 km, which benefits PM₂.₅ exposure evaluations and policy regulations.
Show more [+] Less [-]Development of Artificial Neural Network for prediction of radon dispersion released from Sinquyen Mine, Vietnam
2021
Duong, Van-Hao | Ly, Hai-Bang | Trinh, Dinh Huan | Nguyễn, Thái Sơn | Pham, Binh Thai
Understanding the radon dispersion released from this mine are important targets as radon dispersion is used to assess radiological hazard to human. In this paper, the main objective is to develop and optimize a machine learning model namely Artificial Neural Network (ANN) for quick and accurate prediction of radon dispersion released from Sinquyen mine, Vietnam. For this purpose, a total of million data collected from the study area, which includes input variables (the gamma data of uranium concentration with 3 × 3m grid net survey inside mine, 21 of CR-39 detectors inside dwellings surrounding mine, and gamma dose at 1 m from ground surface data) and an output variable (radon dispersion) were used for training and validating the predictive model. Various validation methods namely coefficient of determination (R²), Mean Absolute Error (MAE), Root Mean Squared Error (RMSE) were used. In addition, Partial dependence plots (PDP) was used to evaluate the effect of each input variable on the predictive results of output variable. The results show that ANN performed well for prediction of radon dispersion, with low values of error (i.e., R² = 0.9415, RMSE = 0.0589, and MAE = 0.0203 for the testing dataset). The increase of number of hidden layers in ANN structure leads the increase of accuracy of the predictive results. The sensitivity results show that all input variables govern the dispersion radon activity with different amplitudes and fitted with different equations but the gamma dose is the most influenced and important variable in comparison with strike, distance and uranium concentration variables for prediction of radon dispersion.
Show more [+] Less [-]Associations between persistent organic pollutants and endometriosis: A multipollutant assessment using machine learning algorithms
2020
Endometriosis is a gynaecological disease characterised by the presence of endometriotic tissue outside of the uterus impacting a significant fraction of women of childbearing age. Evidence from epidemiological studies suggests a relationship between risk of endometriosis and exposure to some organochlorine persistent organic pollutants (POPs). However, these chemicals are numerous and occur in complex and highly correlated mixtures, and to date, most studies have not accounted for this simultaneous exposure. Linear and logistic regression models are constrained to adjusting for multiple exposures when variables are highly intercorrelated, resulting in unstable coefficients and arbitrary findings. Advanced machine learning models, of emerging use in epidemiology, today appear as a promising option to address these limitations. In this study, different machine learning techniques were compared on a dataset from a case-control study conducted in France to explore associations between mixtures of POPs and deep endometriosis. The battery of models encompassed regularised logistic regression, artificial neural network, support vector machine, adaptive boosting, and partial least-squares discriminant analysis with some additional sparsity constraints. These techniques were applied to identify the biomarkers of internal exposure in adipose tissue most associated with endometriosis and to compare model classification performance. The five tested models revealed a consistent selection of most associated POPs with deep endometriosis, including octachlorodibenzofuran, cis-heptachlor epoxide, polychlorinated biphenyl 77 or trans-nonachlor, among others. The high classification performance of all five models confirmed that machine learning may be a promising complementary approach in modelling highly correlated exposure biomarkers and their associations with health outcomes. Regularised logistic regression provided a good compromise between the interpretability of traditional statistical approaches and the classification capacity of machine learning approaches. Applying a battery of complementary algorithms may be a strategic approach to decipher complex exposome-health associations when the underlying structure is unknown.
Show more [+] Less [-]Relative performance of different data mining techniques for nitrate concentration and load estimation in different type of watersheds
2020
Li, Shiyang | Bhattarai, Rabin | Cooke, Richard A. | Verma, Siddhartha | Huang, Xiangfeng | Markus, Momcilo | Christianson, Laura
The increasing availability of water quality datasets has led to a greater focus on hydrologic and water quality analysis, thus requiring more efficient and accurate modelling methods. Data mining techniques have been increasingly used for water quality analysis and prediction of the concentration and load of nitrogen pollutants instead of more traditional simulation methods. In this study, we tested the multilayer perceptron (MLP), k-nearest neighbor (k-NN), random forest, and reduced error pruning tree (REPTree) methods, along with the traditional linear regression, to predict nitrate levels based on long-term data from six watersheds with different land-use practices in the midwestern United States. Both the concentration and load results indicated that REPTree had the best performance, with an R² of 0.61–0.85 and a relative absolute error of <75.8%. The different watershed types, however, influenced the performance of the data mining methods, where all four methods showed a higher accuracy for urban dominant watershed and lower accuracy for agricultural and forest watersheds. Out of these four methods, classification tree methods (REPTree and RF) performed better than cluster methods (MLP and k-NN) for agricultural and forested watersheds. Our results indicated that both the data structure based on the dominant land use and type of algorithmic method should be carefully considered for selecting a data mining method to predict nitrate concentration and load for a watershed.
Show more [+] Less [-]Long-term calibration models to estimate ozone concentrations with a metal oxide sensor
2020
Sayahi, Tofigh | Garff, Alicia | Quah, Timothy | Lê, Katrina | Becnel, Thomas | Powell, Kody M. | Gaillardon, Pierre-Emmanuel | Butterfield, Anthony E. | Kelly, Kerry E.
Ozone (O₃) is a potent oxidant associated with adverse health effects. Low-cost O₃ sensors, such as metal oxide (MO) sensors, can complement regulatory O₃ measurements and enhance the spatiotemporal resolution of measurements. However, the quality of MO sensor data remains a challenge. The University of Utah has a network of low-cost air quality sensors (called AirU) that primarily measures PM₂.₅ concentrations around the Salt Lake City valley (Utah, U.S.). The AirU package also contains a low-cost MO sensor ($8) that measures oxidizing/reducing species. These MO sensors exhibited excellent laboratory response to O₃ although they exhibited some intra-sensor variability. Field performance was evaluated by placing eight AirUs at two Division of Air Quality (DAQ) monitoring stations with O₃ federal equivalence methods for one year to develop long-term multiple linear regression (MLR) and artificial neural network (ANN) calibration models to predict O₃ concentrations. Six sensors served as train/test sets. The remaining two sensors served as a holdout set to evaluate the applicability of the new calibration models in predicting O₃ concentrations for other sensors of the same type. A rigorous variable selection method was also performed by least absolute shrinkage and selection operator (LASSO), MLR and ANN models. The variable selection indicated that the AirU’s MO oxidizing species and temperature measurements and DAQ’s solar radiation measurements were the most important variables. The MLR calibration model exhibited moderate performance (R² = 0.491), and the ANN exhibited good performance (R² = 0.767) for the holdout set. We also evaluated the performance of the MLR and ANN models in predicting O₃ for five months after the calibration period and the results showed moderate correlations (R²s of 0.427 and 0.567, respectively). These low-cost MO sensors combined with a long-term ANN calibration model can complement reference measurements to understand geospatial and temporal differences in O₃ levels.
Show more [+] Less [-]Artificial neural network model to predict transport parameters of reactive solutes from basic soil properties
2019
Mojid, M.A. | Hossain, A.B.M.Z. | Ashraf, M.A.
Measurement of solute-transport parameters through soils for a wide range of solute- and soil-types is time-consuming, laborious, expensive and practically impossible. So, indirect methods for estimating the transport parameters by pedo-transfer functions are now advancing. This study developed and evaluated an Artificial Neural Network (ANN) model for estimating the transport velocity (V), dispersion coefficient (D) and retardation factor (R) of NaAsO₂, Pb(NO₃)₂, Cd(NO₃)₂, C₉H₉N₃O₂ and CaCl₂ from the basic soil properties. Breakthrough data of the solutes were measured in 14 agricultural soils of Bangladesh by time-domain reflectometry (TDR) in repacked soil columns under unsaturated steady-state water-flow conditions. The transport parameters of the chemicals were determined by analyzing the solute breakthrough data. Bulk density (γ), organic carbon (OC), clay (C) content, pH, median grain diameter (D₅₀) and uniformity coefficient (Cᵤ) of the soils were determined. An ANN model for V, D and R was developed by using data of eight soils, validated/tested with the data of five soils and verified with the data of one soil. Clay content and bulk density of the soils were the most sensitive input variables to the ANN model followed by other soil properties (OC, C, pH, D₅₀ and Cᵤ). The model reliably predicted V, D and R with relative root-mean-square error (RRMSE) of 0.028–0.363, mean error (ME) of – 0.00004 to 0.0005, bias error (BOE%) of 0–0.003 and modeling efficiency (EF) of >0.99. Thus, the ANN model can significantly enhance prediction of pollution transport through soils in terms of cost and effort.
Show more [+] Less [-]Space-time PM2.5 mapping in the severe haze region of Jing-Jin-Ji (China) using a synthetic approach
2018
Long- and short-term exposure to PM2.5 is of great concern in China due to its adverse population health effects. Characteristic of the severity of the situation in China is that in the Jing-Jin-Ji region considered in this work a total of 2725 excess deaths have been attributed to short-term PM2.5 exposure during the period January 10–31, 2013. Technically, the processing of large space-time PM2.5 datasets and the mapping of the space-time distribution of PM2.5 concentrations often constitute high-cost projects. To address this situation, we propose a synthetic modeling framework based on the integration of (a) the Bayesian maximum entropy method that assimilates auxiliary information from land-use regression and artificial neural network (ANN) model outputs based on PM2.5 monitoring, satellite remote sensing data, land use and geographical records, with (b) a space-time projection technique that transforms the PM2.5 concentration values from the original spatiotemporal domain onto a spatial domain that moves along the direction of the PM2.5 velocity spread. An interesting methodological feature of the synthetic approach is that its components (methods or models) are complementary, i.e., one component can compensate for the occasional limitations of another component. Insight is gained in terms of a PM2.5 case study covering the severe haze Jing-Jin-Ji region during October 1–31, 2015. The proposed synthetic approach explicitly accounted for physical space-time dependencies of the PM2.5 distribution. Moreover, the assimilation of auxiliary information and the dimensionality reduction achieved by the synthetic approach produced rather impressive results: It generated PM2.5 concentration maps with low estimation uncertainty (even at counties and villages far away from the monitoring stations, whereas during the haze periods the uncertainty reduction was over 50% compared to standard PM2.5 mapping techniques); and it also proved to be computationally very efficient (the reduction in computational time was over 20% compared to standard mapping techniques).
Show more [+] Less [-]