Investigating Search Algorithms for Shorter Documents : A study on how to search for titles
Abstract: The objective of this thesis was to explore whether there are alternatives to the established search ranking algorithm Best Matching 25 (BM25) when searching for shorter documents, in particular for the search of titles. Five search engines were compared to BM25, three of them being variants of the BM25 algorithm and the other two being based on a binary independence model that does not take term frequency or length normalisation into account. The evaluation data consisted of titles of Wikipedia articles from the fair ranking track retrieved from the main conference in the field, Text REtrieval Conference (TREC), and user logs collected from user search queries from Spotify. It was found that none of the alternative models consistently outperformed the standard BM25 for a query q where the number of words in q ranges between 1 ≤ |q| ≤ 8. Yet, for shorter queries |q| ≤ 3, the binary independence model and BM25 adaptive term (BM25adpt) outperformed the standard BM25. Furthermore, a 1% increase in Mean Average Precision (MAP) score was acquired with a binary independence model and BM25adpt compared to BM25 when sampling queries from the user log data. However, because of the bias in the evaluation data together with the small percentage increase in MAP score, it was concluded that the potential benefit of using the methods explored in this thesis is not enough to justify switching from the BM25 algorithm when searching for titles.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)