Semantic-based multilingual document clustering via tensor modeling

Romeo, S.; Tagarelli, A.; Ienco, Dino; Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica [Calabria]  ; Università della Calabria [Arcavacata di Rende, Italia] = University of Calabria [Italy] = Université de Calabre [Italie]; Territoires, Environnement, Télédétection et Information Spatiale  ; Centre de Coopération Internationale en Recherche Agronomique pour le Développement -AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture

Semantic-based multilingual document clustering via tensor modeling

2014

Romeo, S. | Tagarelli, A. | Ienco, Dino | Dipartimento di Ingegneria Informatica, Modellistica, Elettronica e Sistemistica [Calabria] (DIMES) ; Università della Calabria [Arcavacata di Rende, Italia] = University of Calabria [Italy] = Université de Calabre [Italie] (UniCal) | Territoires, Environnement, Télédétection et Information Spatiale (UMR TETIS) ; Centre de Coopération Internationale en Recherche Agronomique pour le Développement (Cirad)-AgroParisTech-Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA)

[Departement_IRSTEA]Territoires [TR1_IRSTEA]SYNERGIE [Axe_IRSTEA]TETIS-SISO<br/>EMNLP, Conference on Empirical Methods in Natural Language Processing , Doha, QAT, 25-/10/2014 - 29/10/2014

Show more [+]

International audience

Show more [+]

English. A major challenge in document clustering research arises from the growing amount of text data written in different languages. Previous approaches depend on language-specific solutions (e.g., bilingual dictionaries, sequential machine translation) to evaluate document similarities, and the required transformations may alter the original document semantics. To cope with this issue we propose a new document clustering approach for multilingual corpora that (i) exploits a large-scale multilingual knowledge base, (ii) takes advantage of the multi-topic nature of the text documents, and (iii) employs a tensor-based model to deal with high dimensionality and sparseness. Results have shown the significance of our approach and its better performance w.r.t. classic document clustering approaches, in both a balanced and an unbalanced corpus evaluation.

Show more [+]

AGROVOC Keywords

algorithme modelling

Bibliographic information

Publisher

CCSD

Other Subjects

[sde]environmental sciences; Analyse informatique; Clustering; Modelisation

Language

English

License

info:eu-repo/semantics/OpenAccess

ISSN

01130094

Type

Info:eu-Repo/semantics/conferenceobject; Conference Papers

Source

EMNLP, Conference on Empirical Methods in Natural Language Processing, https://hal.science/hal-01130094, EMNLP, Conference on Empirical Methods in Natural Language Processing, Oct 2014, Doha, France. 10 p

In AGRIS since: 2025-01-30

Modification date: 2025-04-17

Format: Dublin Core

Data Provider

This bibliographic record has been provided by AgroParisTech

Discover this data provider's collection in AGRIS

Links

https://hal.science/hal-01130094 https://hal.science/hal-01130094v1/document https://hal.science/hal-01130094v1/file/mt2014-pub00042217.pdf

Lookup at Google Scholar

If you notice any incorrect information relating to this record, please contact us at agris@fao.org