Predictive Modelling of Maize Yield Using Multimodal Deep Learning Integrating Genotypic, Management and Weather Data : Exploring the feasibility of integrating disparate datasets for yield prediction
2025
Nguyen, VINH | Helsingin yliopisto, Maatalous-metsätieteellinen tiedekunta | University of Helsinki, Faculty of Agriculture and Forestry | Helsingfors universitet, Agrikultur-forstvetenskapliga fakulteten
It is important to maintain the production of maize (Zea mays L.), one of the most important crops in the world to ensure food security for an increasing population, against climate change and other challenges. Developing genomic prediction models that predict breeding values from genotypic data is crucial to select superior individuals in plant breeding programs. This study explores the emerging potential of multimodal deep learning to develop a foundational maize yield prediction model, integrating disparate, unrelated datasets which include (1) Single nucleotide polymorphisms from MaizeSNPDB, (2) Yield and management data from the historical long-term Morrow Plots dataset and (3) Weather data with regards to the Morrow Plots’ location. Intermediate fusion strategy was implemented to fuse the three modalities, letting each of them extract meaningful representations on its own before coming to the fusion network for the final yield prediction. The model achieved solid metric evaluation results with R² score of 0.61, RMSE of 33.66 bu/ac and MAE of 24.72 bu/ac, given the disparate nature of the datasets. This was further supported by the subsequent biological evaluation through genotypic simulation, confirming the model’s reliability to predict the yield, rank the top-performing genotypes, capture the complex genotype × environment × management interactions and reflect the known agricultural and biological knowledge. Genotypic simulation results uncovered some genotypes that are highly flexible across different conditions and resilient under high stress. These can be studied further to develop resilient hybrids. These results confirmed the feasibility of using disparate datasets for yield prediction, which could be a new focus in addition to the approach that relies on matched datasets from the same field trial. The results cement the potential and superiority of multimodal deep learning which combines data from multiple sources over the conventional unimodal deep learning methods, offering a foundation for future applications and developments to provide more insights to the plant breeding programs.
Mostrar más [+] Menos [-]Información bibliográfica
Este registro bibliográfico ha sido proporcionado por University of Helsinki