EpidGPT: A combined strategy to discriminate between redundant and new information for epidemiological surveillance systems

Menya, Edmond; Roche, Mathieu; Interdonato, Roberto; Owuor, Dickson; Territoires, Environnement, Télédétection et Information Spatiale  ; Centre de Coopération Internationale en Recherche Agronomique pour le Développement -AgroParisTech-Centre National de la Recherche Scientifique -Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement; Strathmore University; European Commission;EC;UE;http://dx.doi.org/10.13039/501100000780; Ambassade de France à Nairobi;;KEN;; Direction générale de l'alimentation;DGAL;FRA;; Rapp Amon; Di Caro Luigi; Meziane Farid; Sugumaran Vijayan; European Project: 874850,H2020-SC1-2019-Single-Stage-RTD,MOOD

EpidGPT: A combined strategy to discriminate between redundant and new information for epidemiological surveillance systems

2024

Menya, Edmond | Roche, Mathieu | Interdonato, Roberto | Owuor, Dickson | Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS) ; Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE) | Strathmore University | European Commission;EC;UE;http://dx.doi.org/10.13039/501100000780 | Ambassade de France à Nairobi;;KEN; | Direction générale de l'alimentation;DGAL;FRA; | Rapp Amon (ed.) | Di Caro Luigi (ed.) | Meziane Farid (ed.) | Sugumaran Vijayan (ed.) | European Project: 874850,H2020-SC1-2019-Single-Stage-RTD,MOOD(2020)

Source Agritrop Cirad (https://agritrop.cirad.fr/610401/)

اظهر المزيد [+]

International audience

اظهر المزيد [+]

إنجليزي. Textual documents such as online news articles have become a key source in epidemiological surveillance such as being used in the detection of new and re-emerging diseases. However, such sources suffer redundancies with the need to automate the process of identifying novel information. In this paper, we propose a framework for learning novel thematic information in epidemiological news documents. Our approach involves both extraction and classification of new, duplicate, additional and/or missing pieces of relevant information in epidemiological news documents. Firstly, we propose an initial step to solve the limited data problem where fewer gold labelled datasets exists for training text-based epidemiological surveillance systems. This initial step is built using extractive question answering technique whereby we automate the process of extracting relevant thematic features inclusive of disease and host names, location and date of reported events and reported number of cases in order to create a large silver labelled dataset. We then propose a main step where we build a novelty information classification model that is trained using our large silver labeled dataset. We then test our novelty classifier model alongside competitive ones on the challenge of detecting whether there is novel, redundant and/or missing information in a target epidemiological news article. We later carry out ablation studies on the most informative document segments in epidemiological news articles.

اظهر المزيد [+]

الكلمات المفتاحية الخاصة بالمكنز الزراعي (أجروفوك)

text mining

المعلومات البيبليوغرافية

الناشر

CCSD, Springer

مواضيع أخرى

[sdv]life sciences [q-bio]; Language model; Animal disease surveillance

اللغة

إنجليزي

الرقم الدولي الموحد للكتاب (ردمك)

978-3-031-70238-9

الرقم التسلسلي المعياري الدولي (ردمد)

05182297

النوع

Conference Part; Conference Paper; Conference Part

المصدر

Natural language processing and information systems: 29th International Conference on Applications of Natural Language to Information Systems, NLDB 2024, Turin, Italy, June 25–27, 2024, Proceedings, Part I, Natural Language Processing and Information Systems (NLDB 2024), https://hal.science/hal-05182297, Natural Language Processing and Information Systems (NLDB 2024), Jun 2024, Turin, Italy. pp.439-454, ⟨10.1007/978-3-031-70239-6_30⟩

في أجريس منذ: 2025-09-02

تاريخ التعديل: 2026-02-03

نوع الملف: Dublin Core

مزود البيانات

تم تزويد هذا السجل من قبل Institut national de la recherche agronomique

اكتشف مجموعة مزود البيانات هذا في أجريس

الروابط

DOI https://hal.science/hal-05182297

تصفح الباحث العلمي من جوجل

إذا لاحظت أي معلومات غير صحيحة تتعلق بهذا السجل ، يرجى الاتصال بنا [email protected]

أجريس - النظام الدولي للعلوم الزراعية والتكنولوجيا

Share

EpidGPT: A combined strategy to discriminate between redundant and new information for epidemiological surveillance systems

2024

الكلمات المفتاحية الخاصة بالمكنز الزراعي (أجروفوك)

المعلومات البيبليوغرافية