Yet Another Ranking Function for Automatic Multiword Term Extraction
2014
Lossio-Ventura, Juan Antonio | Jonquet, Clement | Roche, Mathieu | Teisseire, Maguelonne | ADVanced Analytics for data SciencE (ADVANSE) ; Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM) ; Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) | Système Multi-agent, Interaction, Langage, Evolution (SMILE) ; Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM) ; Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS) | Polytech'Montpellier ; Université Montpellier 2 - Sciences et Techniques (UM2) | Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS) ; Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA) | A. Przepiórkowski | M. Ogrodniczuk | ANR-12-JS02-0010,SIFR,Indexation sémantique de ressources biomédicales francophones(2012)
[Departement_IRSTEA]Territoires [TR1_IRSTEA]SYNERGIE [Axe_IRSTEA]TETIS-SISO
Show more [+] Less [-]International audience
Show more [+] Less [-]English. Term extraction is an essential task in domain knowledge acquisition. We propose two new measures to extract multiword terms from a domain-specific text. The first measure is both linguistic and statistical based. The second measure is graph-based, allowing assessment of the importance of a multiword term of a domain. Existing measures often solve some problems related (but not completely) to term extraction, e.g., noise, silence, low frequency, large-corpora, complexity of the multiword term extraction process. Instead, we focus on managing the entire set of problems, e.g., detecting rare terms and overcoming the low frequency issue. We show that the two proposed measures outperform precision results previously reported for automatic multiword extraction by comparing them with the state-of-the-art reference measures.
Show more [+] Less [-]AGROVOC Keywords
Bibliographic information
This bibliographic record has been provided by Institut national de la recherche agronomique