Predicting Asthma Hospitalizations from Climate and Air Pollution Data: A Machine Learning-Based Approach
2025
Jean Souza dos Reis | Rafaela Lisboa Costa | Fabricio Daniel dos Santos Silva | Ediclê Duarte Fernandes de Souza | Taisa Rodrigues Cortes | Rachel Helena Coelho | Sofia Rafaela Maito Velasco | Danielson Jorge Delgado Neves | José Firmino Sousa Filho | Cairo Eduardo Carvalho Barreto | Jório Bezerra Cabral Júnior | Herald Souza dos Reis | Keila Rêgo Mendes | Mayara Christine Correia Lins | Thomás Rocha Ferreira | Mário Henrique Guilherme dos Santos Vanderlei | Marcelo Felix Alonso | Glauber Lopes Mariano | Heliofábio Barros Gomes | Helber Barros Gomes
This study explores the predictability of monthly asthma notifications using models built from different machine learning techniques in Maceió:, a municipality with a tropical climate located in the northeast of Brazil. Two sets of predictors were combined and tested, the first containing meteorological variables and pollutants, called exp1, and the second only meteorological variables, called exp2. For both experiments, tests were also carried out incorporating lagged information from the time series of asthma records. The models were trained on 80% of the data and validated on the remaining 20%. Among the five methods evaluated&mdash:random forest (RF), eXtreme Gradient Boosting (XGBoost), Multiple Linear Regression (MLR), support vector machine (SVM), and K-nearest neighbors (KNN)&mdash:the RF models showed superior performance, notably those of exp1 when incorporating lagged asthma notifications as an additional predictor. Minimum temperature and sulfur dioxide emerged as key variables, probably due to their associations with respiratory health and pollution levels, emphasizing their role in asthma exacerbation. The autocorrelation of the residuals was assessed due to the inclusion of lagged variables in some experiments. The results highlight the importance of pollutant and meteorological factors in predicting asthma cases, with implications for public health monitoring. Despite the limitations presented and discussed, this study demonstrates that forecast accuracy improves when a wider range of lagged variables are used, and indicates the suitability of RF for health datasets with complex time series.
Afficher plus [+] Moins [-]Mots clés AGROVOC
Informations bibliographiques
Cette notice bibliographique a été fournie par Multidisciplinary Digital Publishing Institute
Découvrez la collection de ce fournisseur de données dans AGRIS