Natural Language Processing for Patient Data in Clinical Decision Support Systems

University essay from Lunds universitet/Avdelningen för Biomedicinsk teknik

Abstract: In Sweden, prostate cancer is the most common type of cancer among men. The care need within prostate cancer will get higher as the population increases and gets older. With this in mind, there is a need to streamline the care pathway. One way to do this is with a clinical decision support system. Here natural language processing (NLP) plays an important role to handle the big amount of free text data. In this project, we used NLP, more specific text classification and information extraction. The task was to extract key information related to prostate cancer from free text in electronic health records. The key information that we chose to extract were Gleason, PSA and tumour type. Binary classifiers were used to sort out irrelevant texts, to reduce the complexity for the information extraction. We tried different classifiers and methods to extract information. The information extraction method which turned out to be the best was named entity recognition (NER). Another important part of the project was to map the care pathway and data-flow within prostate cancer. To build our algorithms, we mainly used the open source library for text processing called Spacy. The classifiers with the overall best performance were random forest (F-score in average 0.978) and Swedish Spacy CNN (F-score in average 0.965). For the named entity recognition, we used Swedish Spacy CNN and Swedish BERT. The Swedish Spacy CNN in average got F-score of 0.915 while the Swedish BERT got 0.922. In the final and best pipeline, we combined one binary classifier (prostate cancer related or not) with the Spacy CNN NER and got F-score of 0.911.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)