The use of Machine Learning Algorithms for Adaptive Question Selection in Questionnaire-based Mental Health Data Collection Apps

University essay from Lunds universitet/Avdelningen för Biomedicinsk teknik

Abstract: This report discusses the implementation of machine learning algorithms for personalising question selection in a questionnaire-based self-report app for individuals suffering from mental health issues. A so-called \textit{Random Forest - Genetic Algorithm} (RFGA) hybrid prediction method is used to find an optimal set of relevant questions to pose to new users in the app. A complementary ad-hoc parameter-based question evaluation (PBQE) method which is used to identify questions that are relevant on an individual level, and to control the frequency at which questions are posed to recurrent users is also discussed. The RFGA method was able to identify a dozen highly predictive questions in a total set of 160 questions, and thus significantly reduce the dimensionality of the feature space. This reduction in the number of features increased the prediction root mean square error of the random forest from 0.11 to 0.15 ($\in [0,1]$). However, as the main function of the algorithm is to differentiate between features based on their predictive power, and not provide an optimal prediction, this was considered a reasonable trade-off. The questions that were found to be predictive treated topics of self-esteem, relationships, sleeping habits, attitude towards food, sexuality and physical activity. These questions were formulated in an open way, with simple slider or yes/no answers, and it was found that such questions in general had greater predictive power than more specific questions, for example those with several alternatives. The PBQE method was able to identify relevant questions on an individual level and increase the probability of these questions being posed to the user, while reducing the frequency at which other, less relevant, questions are posed. In conclusion, the results show that it is indeed possible to use machine learning methods on mental health self-report data from apps, given that a sufficient volume of high-quality data is available. A key insight is that the formulation and format of the questions greatly affect their predictive power, and that questions should therefore be carefully constructed to be relevant. It was further found that from a machine learning perspective, a smaller number of predictive questions may be desirable in self-report apps, rather than a large number of questions with varying predictive power.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)