'Sorry, I didn't understand that' : A comparison of methods for intent classification for social robotics applications

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Mikaela Åstrand; [2020]

Keywords: ;

Abstract: An important feature in a social robot is the ability to understand natural language. One of the core components in a typical system for natural language understanding (NLU) is so called intent classification; the action of classifying user utterances based on the underlying intents of the user. Previous research on intent classification has mainly been performed on dialogues very different from what can be expected in social robotics. There, dialogues are of a more social nature with utterances often being very short or highly context dependent. It has also been performed under the assumption that all test utterances do indeed belong to one of the predefined intent classes. This is often not the case in an actual application where the user cannot be expected to know the limitations of the system. In this thesis, a number of intent classification methods are evaluated based on how they perform on two tasks: classifying utterances belonging to one of the predefined classes and identifying utterances that are out of scope. For this, three different datasets are used: two existing intent classification datasets and one that was collected as part of this project and that is more typical for dialogues in social robotics. The methods being evaluated are support vector machine (SVM), logistic regression, the intent classifier in the NLU platform Snips, and the neural language model BERT. For SVM and logistic regression, two different feature representation techniques are used: bag-of-words (BoW) with and without tf-idf weighting, and pre-trained GloVe embeddings. Based on the results of these evaluations, three main conclusions are drawn: that simple methods are usually to be preferred over more complicated ones, that out-of-scope detection needs further investigation, and that more datasets typical for different kinds of applications are needed. BERT generally performs the best on both tasks, but SVM and logistic regression are not far behind with pre-trained word embeddings performing no better than BoW and Snips no better than simple classifiers. Previous research on out-of-scope detection is very limited and the results obtained here give no clear indication of what is the overall best approach or what performance is to be expected in different settings. Finally, the intent classification and out-of-scope detection performances differ a lot between different datasets, making representative datasets a necessity for drawing conclusions about expected performance in specific applications.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)