Evaluation of New Features for Extractive Summarization of Meeting Transcripts : Improvement of meeting summarization based on functional segmentation, introducing topic model, named entities and domain specific frequency measure

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Emilio Marinone; [2019]

Keywords: ;

Abstract: Automatic summarization of meeting transcripts has been widely studied in last two decades, achieving continuous improvements in terms of the standard summarization metric (ROUGE). A user study has shown that people noticeably prefer abstractive summarization rather than the extractive approach. However, a fluent and informative abstract depends heavily on the performance of the Information Extraction method(s) applied. In this work, basic concepts useful for understanding meeting summarization methods like Parts-of-Speech (POS), Named Entity Recognition (NER), frequency and similarity measure and topic models are introduced together with a broad literature analysis. The proposed method takes inspiration from the current unsupervised extractive state of the art and introduces new features that improve the baseline. It is based on functional segmentation, meaning that it first aims to divide the preprocessed source transcript into monologues and dialogues. Then, two different approaches are used to extract the most important sentences from each segment, whose concatenation together with redundancy reduction creates the final summary. Results show that a topic model trained on an extended corpus, some variations in the proposed parameters and the consideration of word tags improve the performance in terms of ROUGE Precision, Recall and F-measure. It outperforms the currently best performing unsupervised extractive summarization method in terms of ROUGE-1 Precision and F-measure. A subjective evaluation of the generated summaries demonstrates that the current unsupervised framework is not yet accurate enough for commercial use, but the new introduced features can help supervised methods to achieve acceptable performance. A much larger, non-artificially constructed meeting dataset with reference summaries is also needed for training supervised methods as well as a more accurate algorithm evaluation. The source code is available on GitHub: https://github.com/marinone94/ThesisMeetingSummarization

AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)

Evaluation of New Features for Extractive Summarization of Meeting Transcripts : Improvement of meeting summarization based on functional segmentation, introducing topic model, named entities and domain specific frequency measure

Searchphrases right now

Popular searches

popular essays yesterday (2024-04-24)