Evaluation of New Features for Extractive Summarization of Meeting Transcripts : Improvement of meeting summarization based on functional segmentation, introducing topic model, named entities and domain specific frequency measure

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Emilio Marinone; [2019]

Keywords: ;

Abstract: Automatic summarization of meeting transcripts has been widely stud­ied in last two decades, achieving continuous improvements in terms of the standard summarization metric (ROUGE). A user study has shown that people noticeably prefer abstractive summarization rather than the extractive approach. However, a fluent and informative ab­stract depends heavily on the performance of the Information Extrac­tion method(s) applied. In this work, basic concepts useful for understanding meeting sum­marization methods like Parts-of-Speech (POS), Named Entity Recog­nition (NER), frequency and similarity measure and topic models are introduced together with a broad literature analysis. The proposed method takes inspiration from the current unsupervised extractive state of the art and introduces new features that improve the baseline. It is based on functional segmentation, meaning that it first aims to divide the preprocessed source transcript into monologues and dialogues. Then, two different approaches are used to extract the most impor­tant sentences from each segment, whose concatenation together with redundancy reduction creates the final summary. Results show that a topic model trained on an extended corpus, some variations in the proposed parameters and the consideration of word tags improve the performance in terms of ROUGE Precision, Re­call and F-measure. It outperforms the currently best performing un­supervised extractive summarization method in terms of ROUGE-1 Precision and F-measure. A subjective evaluation of the generated summaries demonstrates that the current unsupervised framework is not yet accurate enough for commercial use, but the new introduced features can help super­vised methods to achieve acceptable performance. A much larger, non-artificially constructed meeting dataset with reference summaries is also needed for training supervised methods as well as a more accu­rate algorithm evaluation. The source code is available on GitHub: https://github.com/marinone94/ThesisMeetingSummarization

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)