Integration of machine learning-based prediction for enhanced Model’s generalization: Application in photocatalytic polishing of palm oil mill effluent (POME)
2020
Ng, Kim Hoong | Gan, Y.S. | Cheng, Chin Kui | Liu, Kun-Hong | Liong, Sze-Teng
In predicting palm oil mill effluent (POME) degradation efficiency, previous developed quadratic model quantitatively evaluated the effects of O2 flowrate, TiO2 loadings and initial concentration of POME in labscale photocatalytic system, which however suffered from low generalization due to the overfitting behaviour. Evidently, high RMSE (131.61) and low R₂ (−630.49) obtained indicates its insufficiency in describing POME degradation at unseen factor ranges, hence verified the fact of poor generalization. To overcome this issue, several models were developed via machine learning-assisted techniques, namely Gaussian Process Regression (GPR), Linear Regression (LR), Decision Tree (DT), Supported Vector Machine (SVM) and Regression Tree Ensemble (RTE), subsequently being assessed systematically. To achieve high generalization, all models were subjected to ‘train-all-test-all’ strategy, 5-fold and 10-fold cross validation. Specifically, GPR model was furnished with high accuracy in ‘train-all-test-all’ strategy, judging from its low RMSE (1.0394) and high R₂ (0.9962), which however menaced by the risk of overfitting. In contrast, despite relatively poorer RMSE and R₂ (1.7964 and 0.9886) obtained in 5-fold cross validation, GPR model was rendered with highest generalization, while sufficiently preserving its accuracy in development process. Besides, SVM and RTE models were also demonstrated promising R₂ (0.9372 and 0.9208), which however shadowed by their high RMSEs (4.2174 and 4.7366). Furthermore, the extraordinary generalization of GPR model was coincidentally verified in 10-fold cross validation. The lowest RMSE (2.1624) and highest R₂ (0.9835) obtained with feature number of 36 asserted its sufficiency in both generalization and accuracy prospect. Other models were all rendered with slight lower R₂ (> 0.9), plausibly due to the higher RMSE (> 4.0). According to GPR model, optimized POME degradation (52.52%) can be obtained at 70 mL/min of O₂, 70.0 g/L of TiO₂ and 250 ppm of POME concentration, with only ∼3% error as compared to the actual data.
显示更多 [+] 显示较少 [-]