EpidGPT: A combined strategy to discriminate between redundant and new information for epidemiological surveillance systems
2024
Menya, Edmond | Roche, Mathieu | Interdonato, Roberto | Owuor, Dickson | Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS) ; Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE) | Strathmore University | European Commission;EC;UE;http://dx.doi.org/10.13039/501100000780 | Ambassade de France à Nairobi;;KEN; | Direction générale de l'alimentation;DGAL;FRA; | Rapp Amon (ed.) | Di Caro Luigi (ed.) | Meziane Farid (ed.) | Sugumaran Vijayan (ed.) | European Project: 874850,H2020-SC1-2019-Single-Stage-RTD,MOOD(2020)
Source Agritrop Cirad (https://agritrop.cirad.fr/610401/)
اظهر المزيد [+] اقل [-]International audience
اظهر المزيد [+] اقل [-]إنجليزي. Textual documents such as online news articles have become a key source in epidemiological surveillance such as being used in the detection of new and re-emerging diseases. However, such sources suffer redundancies with the need to automate the process of identifying novel information. In this paper, we propose a framework for learning novel thematic information in epidemiological news documents. Our approach involves both extraction and classification of new, duplicate, additional and/or missing pieces of relevant information in epidemiological news documents. Firstly, we propose an initial step to solve the limited data problem where fewer gold labelled datasets exists for training text-based epidemiological surveillance systems. This initial step is built using extractive question answering technique whereby we automate the process of extracting relevant thematic features inclusive of disease and host names, location and date of reported events and reported number of cases in order to create a large silver labelled dataset. We then propose a main step where we build a novelty information classification model that is trained using our large silver labeled dataset. We then test our novelty classifier model alongside competitive ones on the challenge of detecting whether there is novel, redundant and/or missing information in a target epidemiological news article. We later carry out ablation studies on the most informative document segments in epidemiological news articles.
اظهر المزيد [+] اقل [-]الكلمات المفتاحية الخاصة بالمكنز الزراعي (أجروفوك)
المعلومات البيبليوغرافية
تم تزويد هذا السجل من قبل Institut national de la recherche agronomique