Optimizing Search Engine Field Weights with Limited Data : Offline exploration of optimal field weight combinations through regression analysis

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Modern search engines, particularly those utilizing the BM25 ranking algorithm, offer a multitude of tunable parameters designed to refine search results. Among these parameters, the weight of each searchable field plays a crucial role in enhancing search outcomes. Traditional methods of discovering optimal weight combinations, however, are often exploratory, demanding substantial time and risking the delivery of substandard results during testing. This thesis proposes a streamlined solution: an ordinal-regression-based model specifically engineered to identify optimal weight combinations with minimal data input, within an offline testing environment. The evaluation corpus comprises a comprehensive snapshot of a product search database from Tradera. The top $100$ search queries and corresponding search results pages on the Tradera platform were divided into a training set and an evaluation set. The model underwent iterative training on the training set, and subsequent testing on the evaluation set, with progressively increasing amounts of labeled data. This methodological approach allowed examining the model's proficiency in deriving high-performance weight combinations from limited data. The empirical experiments conducted confirmed that the proposed model successfully generated promising weight combinations, even with restricted data, and exhibited robust generalization to the evaluation dataset. In conclusion, this research substantiates the significant potential for enhancing search results by tuning searchable field weights using a regression-based model, even in data-scarce scenarios.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)