Luhn’s Point of View: Median-Based Term Weighting Schemes
2019
KOCABAŞ, İlker | KARAOĞLAN, Bahar | DINÇER, Bekir Taner
In this study we replace the TF component of the TFxIDF term weighting method with a parameter derived from Luhn’s claim on termimportance. Luhn claims that the words with the mid frequencies are the most important ones, and the importance of a word fall as the frequency ofthe word increases or decreases. We take the median frequency of the words in a document as the base and assess the importance of a word by thedifference between its frequency and the median frequency. The weighting functions are varied by two normalization approaches as using medianitself and standard deviation of medians and tested on TREC-6 through TREC-8 adhoc tracks. The experimental results of the weightings usingmedian itself, perform better retrieval than basic TFxIDF and BM25 with respect to MAP and R-P measures.
Afficher plus [+] Moins [-]Mots clés AGROVOC
Informations bibliographiques
Cette notice bibliographique a été fournie par Anatolia Academy of Sciences Ltd.
Découvrez la collection de ce fournisseur de données dans AGRIS