Integration of machine learning-based prediction for enhanced Model’s generalization: Application in photocatalytic polishing of palm oil mill effluent (POME)

Ng, Kim Hoong; Gan, Y.S.; Cheng, Chin Kui; Liu, Kun-Hong; Liong, Sze-Teng

Integration of machine learning-based prediction for enhanced Model’s generalization: Application in photocatalytic polishing of palm oil mill effluent (POME)

2020

Ng, Kim Hoong | Gan, Y.S. | Cheng, Chin Kui | Liu, Kun-Hong | Liong, Sze-Teng

In predicting palm oil mill effluent (POME) degradation efficiency, previous developed quadratic model quantitatively evaluated the effects of O2 flowrate, TiO2 loadings and initial concentration of POME in labscale photocatalytic system, which however suffered from low generalization due to the overfitting behaviour. Evidently, high RMSE (131.61) and low R₂ (−630.49) obtained indicates its insufficiency in describing POME degradation at unseen factor ranges, hence verified the fact of poor generalization. To overcome this issue, several models were developed via machine learning-assisted techniques, namely Gaussian Process Regression (GPR), Linear Regression (LR), Decision Tree (DT), Supported Vector Machine (SVM) and Regression Tree Ensemble (RTE), subsequently being assessed systematically. To achieve high generalization, all models were subjected to ‘train-all-test-all’ strategy, 5-fold and 10-fold cross validation. Specifically, GPR model was furnished with high accuracy in ‘train-all-test-all’ strategy, judging from its low RMSE (1.0394) and high R₂ (0.9962), which however menaced by the risk of overfitting. In contrast, despite relatively poorer RMSE and R₂ (1.7964 and 0.9886) obtained in 5-fold cross validation, GPR model was rendered with highest generalization, while sufficiently preserving its accuracy in development process. Besides, SVM and RTE models were also demonstrated promising R₂ (0.9372 and 0.9208), which however shadowed by their high RMSEs (4.2174 and 4.7366). Furthermore, the extraordinary generalization of GPR model was coincidentally verified in 10-fold cross validation. The lowest RMSE (2.1624) and highest R₂ (0.9835) obtained with feature number of 36 asserted its sufficiency in both generalization and accuracy prospect. Other models were all rendered with slight lower R₂ (> 0.9), plausibly due to the higher RMSE (> 4.0). According to GPR model, optimized POME degradation (52.52%) can be obtained at 70 mL/min of O₂, 70.0 g/L of TiO₂ and 250 ppm of POME concentration, with only ∼3% error as compared to the actual data.

显示更多 [+]

AGROVOC关键词

decision support systems models pollution prediction regression analysis risk titanium dioxide

书目信息

发表于

Environmental pollution

卷 267 页码 115500 ISSN 0269-7491

出版者

John Wiley & Sons, Ltd.

其它主题

Photocatalytic treatment; Generalization enhancement; Photocatalysis; Normal distribution; Palm oil mill effluent; Oil mill effluents; Machine learning predictive models

语言

英语

注释

Nal-ap-2-clean

类型

Journal Article; Text

自何时收录于AGRIS: 2024-02-28

格式: MODS

数据提供者

这条记录提供自 National Agricultural Library

发现该数据提供方在AGRIS的更多集合

链接