Cereal and Rapeseed Yield Forecast in Poland at Regional Level Using Machine Learning and Classical Statistical Models
2025
Edyta Okupska | Dariusz Gozdowski | Rafał Pudełko | Elżbieta Wójcik-Gront
This study performed in-season yield prediction, about 2–3 months before the harvest, for cereals and rapeseed at the province level in Poland for 2009–2024. Various models were employed, including machine learning algorithms and multiple linear regression. The satellite-derived normalized difference vegetation index (NDVI) and climatic water balance (CWB), calculated using meteorological data, were treated as predictors of crop yield. The accuracy of the models was compared to identify the optimal approach. The strongest correlation coefficients with crop yield were observed for the NDVI at the beginning of March, ranging from 0.454 for rapeseed to 0.503 for rye. Depending on the crop, the highest R<sup>2</sup> values were observed for different prediction models, ranging from 0.654 for rapeseed based on the random forest model to 0.777 for basic cereals based on linear regression. The random forest model was best for rapeseed yield, while for cereal, the best prediction was observed for multiple linear regression or neural network models. For the studied crops, all models had mean absolute errors and root mean squared errors not exceeding 6 dt/ha, which is relatively small because it is under 20% of the mean yield. For the best models, in most cases, relative errors were not higher than 10% of the mean yield. The results proved that linear regression and machine learning models are characterized by similar predictions, likely due to the relatively small sample size (256 observations).
显示更多 [+] 显示较少 [-]