Atopic dermatitis or eczema? Consequences of ambiguity in disease name for biomedical literature mining
2021
Frainay, Clément | Pitarch, Yoann | Filippi, Sarah | Evangelou, Marina | Custovic, Adnan | Imperial College London | Métabolisme et Xénobiotiques (ToxAlim-MeX) ; ToxAlim (ToxAlim) ; Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Ecole Nationale Vétérinaire de Toulouse (ENVT) ; Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Ecole d'Ingénieurs de Purpan (INP - PURPAN) ; Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)-Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Ecole Nationale Vétérinaire de Toulouse (ENVT) ; Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Ecole d'Ingénieurs de Purpan (INP - PURPAN) ; Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE) | Recherche d’Information et Synthèse d’Information (IRIT-IRIS) ; Institut de recherche en informatique de Toulouse (IRIT) ; Université Toulouse Capitole (UT Capitole) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) ; Université Toulouse - Jean Jaurès (UT2J) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT)-Université Toulouse Capitole (UT Capitole) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse - Jean Jaurès (UT2J) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT)-Centre National de la Recherche Scientifique (CNRS)-Institut National Polytechnique (Toulouse) (Toulouse INP) ; Université de Toulouse (UT)-Toulouse Mind & Brain Institut (TMBI) ; Université Toulouse - Jean Jaurès (UT2J) ; Université de Toulouse (UT)-Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT)-Université Toulouse III - Paul Sabatier (UT3) ; Université de Toulouse (UT) | National Heart and Lung Institute [London] (NHLI) ; Imperial College London-Royal Brompton and Harefield NHS Foundation Trust
International audience
Показать больше [+] Меньше [-]Английский. Background: Biomedical research increasingly relies on computational approaches to extract relevant information from large corpora of publications. Objective: To investigate the consequence of the ambiguity between the use of terms "Eczema" and "Atopic Dermatitis" (AD) from the Information Retrieval perspective, and its impact on meta-analyses, systematic reviews and text mining. Methods: Articles were retrieved by querying the PubMed using terms 'eczema' (D003876) and "dermatitis, atopic" (D004485). We used machine learning to investigate the differences between the contexts in which each term is used. We used a decision tree approach and trained model to predict if an article would be indexed with eczema or AD tags. We used text-mining tools to extract biological entities associated with eczema and AD, and investigated the discrepancy regarding the retrieval of key findings according to the terminology used. Results: Atopic dermatitis query yielded more articles related to veterinary science, biochemistry, cellular and molecular biology; the eczema query linked to public health, infectious disease and respiratory system. Medical Subject Headings terms associated with "AD" or "Eczema" differed, with an agreement between the top 40 lists of 52%. The presence of terms related to cellular mechanisms, especially allergies and inflammation, characterized AD literature. The metabolites mentioned more frequently than expected in articles with AD tag differed from those indexed with eczema. Fewer enriched genes were retrieved when using eczema compared to AD query. Conclusions and clinical relevance: There is a considerable discrepancy when using text mining to extract bio-entities related to eczema or AD. Our results suggest that any systematic approach (particularly when looking for metabolites or genes related to the condition) should be performed using both terms jointly. We propose to use decision tree learning as a tool to spot and characterize ambiguity, and provide the source code for disambiguation at https://github.com/cfrainay/ResearchCodeBase.
Показать больше [+] Меньше [-]Ключевые слова АГРОВОК
Библиографическая информация
Эту запись предоставил Institut national de la recherche agronomique