Finding Similarities Between Hierarchically Related Thema Subject Categories Using Embeddings

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Erik Björck; [2020]

Keywords: ;

Abstract: In this thesis, embeddings have been used to find similarities between hierarchically related Thema Subject Categories (Thema codes), which are short alphanumeric sequences commonly used to categorize books. More specifically, the graph embedding approach known as DeepWalk was applied to three different models to learn similarities between Thema codes. The data consisted of pairs of Thema codes gathered from user preferences of books in the Swedish online book application Storytel. By constructing graphs from Thema codes and their pairwise occurrences, high dimensional similarities between Thema codes could be learned. To evaluate the models, three different offline evaluation methods, and one online evaluation method was used. In the online evaluation, it was shown through one week of A/B testing that click-through rate increased in two recommendation lists in the Storytel application when the embeddings were used for Thema code similarities between books. The results show that DeepWalk is suitable to use when learning the embeddings of Thema codes for the task of recommendation. Valuable future research could thus include investigating other embedding approaches of Thema codes.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)