Luhn’s Point of View: Median-Based Term Weighting Schemes
2019
KOCABAŞ, İlker | KARAOĞLAN, Bahar | DINÇER, Bekir Taner
In this study we replace the TF component of the TFxIDF term weighting method with a parameter derived from Luhn’s claim on termimportance. Luhn claims that the words with the mid frequencies are the most important ones, and the importance of a word fall as the frequency ofthe word increases or decreases. We take the median frequency of the words in a document as the base and assess the importance of a word by thedifference between its frequency and the median frequency. The weighting functions are varied by two normalization approaches as using medianitself and standard deviation of medians and tested on TREC-6 through TREC-8 adhoc tracks. The experimental results of the weightings usingmedian itself, perform better retrieval than basic TFxIDF and BM25 with respect to MAP and R-P measures.
Mostrar más [+] Menos [-]Palabras clave de AGROVOC
Información bibliográfica
Este registro bibliográfico ha sido proporcionado por Anatolia Academy of Sciences Ltd.