Interpretable Machine Learning Models for Early Detection of Subclinical Mastitis using Routine Milk Composition Data
2025
YALÇİN, Hamza
Subclinical mastitis (SCM) imposes substantial economic losses on the global dairy industry. Conventional diagnostics are often ill-suited for rapid, large-scale screening, highlighting the need for novel diagnostic approaches. The objective of this study was to evaluate seven machine learning (ML) and deep learning (DL) models for predicting SCM (Somatic Cell Count (SCC)>200.000 cells mL-1) from routine milk composition data. Using a dataset of 1.391 milk records, we evaluated models based on fat, protein, lactose, total solids (TS), and milk urea nitrogen (MUN) features. The training data was balanced using the Synthetic Minority Over-sampling Technique (SMOTE) to prevent bias towards the majority class. The best model's predictions were interpreted using SHapley Additive exPlanations (SHAP) to identify key predictive factors. The Extreme Gradient Boosting (XGBoost) model delivered the highest performance, achieving 82.3% accuracy and an 87.8% F1-Score on the unaltered test set. Tree-based ensemble models also outperformed DL and simpler classifiers. SHAP analysis identified lactose and protein as the most decisive features; lower lactose and higher protein levels were highly predictive of SCM, which is consistent with established pathophysiology. The results establish that an interpretable model using routine milk data offers a robust, non-invasive, and cost-effective framework for early SCM detection. This provides a valuable decision-support tool for improving udder health management and farm sustainability.
Show more [+] Less [-]AGROVOC Keywords
Bibliographic information
This bibliographic record has been provided by Institute of Economic Development and Social Researches