Refine search
Results 1-10 of 86
Integration of machine learning-based prediction for enhanced Model’s generalization: Application in photocatalytic polishing of palm oil mill effluent (POME)
2020
Ng, Kim Hoong | Gan, Y.S. | Cheng, Chin Kui | Liu, Kun-Hong | Liong, Sze-Teng
In predicting palm oil mill effluent (POME) degradation efficiency, previous developed quadratic model quantitatively evaluated the effects of O2 flowrate, TiO2 loadings and initial concentration of POME in labscale photocatalytic system, which however suffered from low generalization due to the overfitting behaviour. Evidently, high RMSE (131.61) and low R₂ (−630.49) obtained indicates its insufficiency in describing POME degradation at unseen factor ranges, hence verified the fact of poor generalization. To overcome this issue, several models were developed via machine learning-assisted techniques, namely Gaussian Process Regression (GPR), Linear Regression (LR), Decision Tree (DT), Supported Vector Machine (SVM) and Regression Tree Ensemble (RTE), subsequently being assessed systematically. To achieve high generalization, all models were subjected to ‘train-all-test-all’ strategy, 5-fold and 10-fold cross validation. Specifically, GPR model was furnished with high accuracy in ‘train-all-test-all’ strategy, judging from its low RMSE (1.0394) and high R₂ (0.9962), which however menaced by the risk of overfitting. In contrast, despite relatively poorer RMSE and R₂ (1.7964 and 0.9886) obtained in 5-fold cross validation, GPR model was rendered with highest generalization, while sufficiently preserving its accuracy in development process. Besides, SVM and RTE models were also demonstrated promising R₂ (0.9372 and 0.9208), which however shadowed by their high RMSEs (4.2174 and 4.7366). Furthermore, the extraordinary generalization of GPR model was coincidentally verified in 10-fold cross validation. The lowest RMSE (2.1624) and highest R₂ (0.9835) obtained with feature number of 36 asserted its sufficiency in both generalization and accuracy prospect. Other models were all rendered with slight lower R₂ (> 0.9), plausibly due to the higher RMSE (> 4.0). According to GPR model, optimized POME degradation (52.52%) can be obtained at 70 mL/min of O₂, 70.0 g/L of TiO₂ and 250 ppm of POME concentration, with only ∼3% error as compared to the actual data.
Show more [+] Less [-]Comparison of land use regression and random forests models on estimating noise levels in five Canadian cities
2020
Liu, Ying | Goudreau, Sophie | Oiamo, Tor | Rainham, Daniel | Hatzopoulou, Marianne | Chen, Hong | Davies, Hugh | Tremblay, Mathieu | Johnson, James | Bockstael, Annelies | Leroux, Tony | Smargiassi, Audrey
Chronic exposure to environment noise is associated with sleep disturbance and cardiovascular diseases. Assessment of population exposed to environmental noise is limited by a lack of routine noise sampling and is critical for controlling exposure and mitigating adverse health effects. Land use regression (LUR) model is newly applied in estimating environmental exposures to noise. Machine-learning approaches offer opportunities to improve the noise estimations from LUR model. In this study, we employed random forests (RF) model to estimate environmental noise levels in five Canadian cities and compared noise estimations between RF and LUR models. A total of 729 measurements and 33 built environment-related variables were used to estimate spatial variation in environmental noise at the global (multi-city) and local (individual city) scales. Leave one out cross-validation suggested that noise estimates derived from the RF global model explained a greater proportion of variation (R2: RF = 0.58, LUR = 0.47) with lower root mean squared errors (RF = 4.44 dB(A), LUR = 4.99 dB(A)). The cross-validation also indicated the RF models had better general performance than the LUR models at the city scale. By applying the global models to estimate noise levels at the postal code level, we found noise levels were higher in Montreal and Longueuil than in other major Canadian cities.
Show more [+] Less [-]Determining and mapping the spatial mismatch between soil and rice cadmium (Cd) pollution based on a decision tree model
2020
Wang, Yuanmin | Wu, Shaohua | Yan, Daohao | Li, Fufu | Chengcheng, Wang | Min, Cheng | Wenyu, Sun
Environmental complexity leads to differences in the spatial distribution of heavy metal pollution in soil and rice. Such spatial differences will seriously affect the safety of planted rice and can impact regional management and control. How to scientifically reveal these spatial differences is an urgent problem. In this study, the spatial mismatch relationship between Cd pollution in soil and rice grains (brown rice) was first explored by the interpolation method. To further reveal the causes of these, the specific recognition rules of the spatial relationship of Cd pollution were extracted based on a decision tree model, and the results were mapped. The results revealed a spatial mismatch in Cd pollution between the soil and rice grains in the study area, and the main results are as follows: (i) slight soil pollution and safe rice accounted for 68.88% of the area; (ii) slight soil pollution and serious rice pollution accounted for 13.39% of the area and (iii) safe soil and serious rice pollution accounted for 11.63% of the area. In addition, 11 recognition rules of Cd spatial pollution relationship between soil and rice were proposed, and the main environmental factors were determined: SOM (soil organic matter), Dis-residence (distance from residential area), soil pH and LAI (leaf area index). The average accuracy of rule recognition was 75.90%. The study reveals the spatial mismatch of heavy metal pollution in soil and crops, providing decision-making references for the spatial accurate identification and targeted prevention of heavy metal pollution spaces.
Show more [+] Less [-]Quadratic discriminant analysis model for assessing the risk of cadmium pollution for paddy fields in a county in China
2018
Wang, Xiumei | Li, Xiujian | Ma, Ruoyu | Li, Yue | Wang, Wei | Huang, Hanyu | Xu, Chenzi | An, Yi
In China, the cadmium (Cd) levels in paddy fields have increased, which has led to the excessive uptake of Cd into rice grains. In this study, we determined the physicochemical properties of soil samples, including the pH, soil organic matter (SOM) content, cation exchange capacity (CEC), and total Cd content (Cdsoil) in order to establish a quadratic discriminant analysis (QDA) model for assessing the risk of Cd in rice and to calculate its prior probability. Decision tree and logistic regression models were also established for comparison. The results showed that the accuracy rate was 74% with QDA, which was significantly higher than that obtained using the decision tree (67%) and logistic regression (68%) models. The correlation coefficients between the soil pH and the other three factors (CEC, SOM, and Cdsoil) were higher in the inaccurate set than the accurate set, whereas the correlation coefficients were smaller in the inaccurate set than the accurate set.
Show more [+] Less [-]A sustainable Decision Support System for soil bioremediation of toluene incorporating UN sustainable development goals
2022
Akbarian, Hadi | Jalali, Farhad Mahmoudi | Gheibi, Mohammad | Hajiaghaei-Keshteli, Mostafa | Akrami, Mehran | Sarmah, Ajit K.
Decision Support System (DSS) is a novel approach for smart, sustainable controlling of environmental phenomena and purification processes. Toluene is one of the most widely used petroleum products, which adversely impacts on human health. In this study, Fusarium Solani fungi are utilized as the engine of the toluene bioremediation procedure for the monitoring part of DSS. Experiments are optimized by Central Composite Design (CCD) - Response Surface Methodology (RSM), and the behavior of the mentioned fungi is estimated by M5 Pruned model tree (M5P), Gaussian Processes (GP), and Sequential Minimal Optimization (SMOreg) algorithms as the prediction section of DSS. Finally, the control stage of DSS is provided by integrated Petri Net modeling and Failure Modes and Effects Analysis (FMEA). The findings showed that Aeration Intensity (AI) and Fungi load/Biological Waste (F/BW) are the most influential mechanical and biological factors, with P-value of 0.0001 and 0.0003, respectively. Likewise, the optimal values of main mechanical parameters include AI, and the space between pipes (S) are equal to 13.76 m³/h and 15.99 cm, respectively. Also, the optimum conditions of biological features containing F/BW and pH are 0.001 mg/g and 7.56. In accordance with the kinetic study, bioremediation of toluene by Fusarium Solani is done based on a first-order reaction with a 0.034 s-1 kinetic coefficient. Finally, the machine learning practices showed that the GP (R2 = 0.98) and M5P (R2 = 0.94) have the most precision for predicting Removal Percentage (RP) for mechanical and biological factors, respectively. At the end of the present research, it is found that by controlling seven possible risk factors in bioremediation operation through the FMEA- Petri Net technique, efficiency of the process can be adjusted to optimum value.
Show more [+] Less [-]Predicting nanotoxicity by an integrated machine learning and metabolomics approach
2020
Peng, Ting | Wei, Changhong | Yu, Fubo | Xu, Jing | Zhou, Qixing | Shi, Tonglei | Hu, Xiangang
Predicting the biological responses to engineered nanoparticles (ENPs) is critical to their environmental health assessment. The disturbances of metabolic pathways reflect the global profile of biological responses to ENPs but are difficult to predict due to the highly heterogeneous data from complicated biological systems and various ENP properties. Herein, integrating multiple machine learning models and metabolomics enabled accurate prediction of the disturbance of metabolic pathways induced by 33 ENPs. Screening nine typical properties of ENPs identified type and size as the top features determining the effects on metabolic pathways. Similarity network analysis and decision tree models overcame the highly heterogeneous data sources to visualize and judge the occurrence of metabolic pathways depending on the sorting priority features. The model accuracy was verified by animal experiments and reached 75%–100%, even for the prediction of ENPs outside of databases. The models also predicted metabolic pathway-related histopathology. This work provides an approach for the quick assessment of environmental health risks induced by known and unknown ENPs.
Show more [+] Less [-]Spatial soil zinc content distribution from terrain parameters: A GIS-based decision-tree model in Lebanon
2010
Kheir, Rania Bou | Greve, Mogens H. | Abdallah, Chadi | Dalgaard, Tommy
Heavy metal contamination has been and continues to be a worldwide phenomenon that has attracted a great deal of attention from governments and regulatory bodies. In this context, our study proposes a regression-tree model to predict the concentration level of zinc in the soils of northern Lebanon (as a case study of Mediterranean landscapes) under a GIS environment. The developed tree-model explained 88% of variance in zinc concentration using pH (100% in relative importance), surroundings of waste areas (90%), proximity to roads (80%), nearness to cities (50%), distance to drainage line (25%), lithology (24%), land cover/use (14%), slope gradient (10%), conductivity (7%), soil type (7%), organic matter (5%), and soil depth (5%). The overall accuracy of the quantitative zinc map produced (at 1:50.000 scale) was estimated to be 78%. The proposed tree model is relatively simple and may also be applied to other areas. GIS regression-tree analysis explained 88% of the variability in field/laboratory Zinc concentrations.
Show more [+] Less [-]Self-organizing feature map (neural networks) as a tool to select the best indicator of road traffic pollution (soil, leaves or bark of Robinia pseudoacacia L.)
2009
Samecka-Cymerman, A. | Stankiewicz, A. | Kolon, K. | Kempers, A.J.
Concentrations of the elements Cd, Co, Cr, Cu, Fe, Mn, Ni, Pb and Zn were measured in the leaves and bark of Robinia pseudoacacia and the soil in which it grew, in the town of Oleśnica (SW Poland) and at a control site. We selected this town because emission from motor vehicles is practically the only source of air pollution, and it seemed interesting to evaluate its influence on soil and plants. The self-organizing feature map (SOFM) yielded distinct groups of soils and R. pseudoacacia leaves and bark, depending on traffic intensity. Only the map classifying bark samples identified an additional group of highly polluted sites along the main highway from Wrocław to Warszawa. The bark of R. pseudoacacia seems to be a better bioindicator of long-term cumulative traffic pollution in the investigated area, while leaves are good indicators of short-term seasonal accumulation trends. Once trained, SOFM could be used in the future to recognize types of pollution.
Show more [+] Less [-]Future climate scenarios and rainfall-runoff modelling in the Upper Gallego catchment (Spain)
2007
Burger, C.M. | Kolditz, O. | Fowler, H.J. | Blenkinsop, S.
Global climate change may have large impacts on water supplies, drought or flood frequencies and magnitudes in local and regional hydrologic systems. Water authorities therefore rely on computer models for quantitative impact prediction. In this study we present kernel-based learning machine river flow models for the Upper Gallego catchment of the Ebro basin. Different learning machines were calibrated using daily gauge data. The models posed two major challenges: (1) estimation of the rainfall-runoff transfer function from the available time series is complicated by anthropogenic regulation and mountainous terrain and (2) the river flow model is weak when only climate data are used, but additional antecedent flow data seemed to lead to delayed peak flow estimation. These types of models, together with the presented downscaled climate scenarios, can be used for climate change impact assessment in the Gallego, which is important for the future management of the system. Future climate change and data-based rainfall-runoff predictions are presented for the Upper Gallego.
Show more [+] Less [-]Improved anthropogenic heat flux model for fine spatiotemporal information in Southeast China
2022
Qian, Jiangkang | Meng, Qingyan | Zhang, Linlin | Hu, Die | Hu, Xinli | Liu, Wenxiu
Anthropogenic heat emission (AHE) is an important driver of urban heat islands (UHIs). Further, both urban thermal environment research and sustainable development planning require an efficient estimation of anthropogenic heat flux (AHF). Therefore, this study proposed an improved multi-source AHF model, which was constructed using diverse data sources and small-scale samples, to better represent the spatiotemporal distribution of AHF. The performances of three machine learning algorithms (Cubist, gradient boosting decision tree, and simple linear regression) were quantitatively evaluated, and the impact of spatiotemporal heterogeneity on AHF estimation was considered for the first time. The results showed that multi-source datasets and sophisticated algorithms could more effectively reduce the estimation error and improve the accuracy of the spatiotemporal distribution of AHF than simple linear regression. In practical applications, the Cubist model performed better, with prediction errors being less than 0.9 W⋅m−2. Further, the characteristics of different heat sources from the model outputs varied widely, and the building metabolic heat exhibited significant seasonal spatiotemporal variations, which were largely determined by the regional climate. In contrast, industrial and transportation heat showed marginal monthly fluctuations. Similarly, spatiotemporal heterogeneity significantly affected the estimation of building metabolic heat (0.62 W⋅m−2), but it did not affect other heat sources. The proposed improved AHF model was verified to effectively capture the spatiotemporal variations of building heat and solve the issue of overestimation of industrial heat in urban regions. This study provides new methods and ideas for the accurate spatiotemporal quantification of AHF that can supplement future studies on climate warming, UHI, and air pollution.
Show more [+] Less [-]