Luhn’s Point of View: Median-Based Term Weighting Schemes
2019
KOCABAŞ, İlker | KARAOĞLAN, Bahar | DINÇER, Bekir Taner
In this study we replace the TF component of the TFxIDF term weighting method with a parameter derived from Luhn’s claim on termimportance. Luhn claims that the words with the mid frequencies are the most important ones, and the importance of a word fall as the frequency ofthe word increases or decreases. We take the median frequency of the words in a document as the base and assess the importance of a word by thedifference between its frequency and the median frequency. The weighting functions are varied by two normalization approaches as using medianitself and standard deviation of medians and tested on TREC-6 through TREC-8 adhoc tracks. The experimental results of the weightings usingmedian itself, perform better retrieval than basic TFxIDF and BM25 with respect to MAP and R-P measures.
اظهر المزيد [+] اقل [-]الكلمات المفتاحية الخاصة بالمكنز الزراعي (أجروفوك)
المعلومات البيبليوغرافية
تم تزويد هذا السجل من قبل Anatolia Academy of Sciences Ltd.