Sentence Embeddings and Automatic Classification of Menu Items

University essay from Uppsala universitet/Institutionen för informationsteknologi

Author: Aljaz Kovac; [2022]

Keywords: ;

Abstract: Caspeco AB is a company in Uppsala that specializes in providing IT solutions to the hospitality industry. Their customers (restaurants, pubs, etc.) classify their menu items freely, which leads to a classification that is often inconsistent and unreliable. This thesis presents an attempt to automatically and reliably classify menu items based on their names alone. A technique from natural language processing (NLP), called “sentence embedding”, by which sentences are mapped to high-dimensional vectors of real numbers, is employed. SentenceTransformers (SBERT), a Python framework for state-of-the-art sentence embeddings, is used to compute embeddings for a multilingual menu from a restaurant in Uppsala, both for raw (as-is) item names and for processed item names. In the first part of the thesis, three sets of embeddings, produced by three different SBERT models, are used in an unsupervised learning context where K-means and hierarchical clustering algorithms are implemented in order to find a set of embeddings and a type of data that performs best with clustering. The results are inconclusive and no set of embeddings nor type of data that would give a superior performance is identified. In the second part of the thesis, the embeddings are used in a custom classifier, as well as in three different implementations of the K-nearest neighbors classifier, and the performances are again compared. The results show that the embeddings of the “distiluse” model give the best performance and that raw data is to be preferred over processed data. Furthermore, the results indicate that, for this particular restaurant, it would be possible to build a reliable automatic classification system based on item names alone since the best-performing classifier hits a peak macro F1-score of 0.87 and a peak accuracy score of 0.89.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)