Abstractive Summarization of Podcast Transcriptions
Abstract: In the rapidly growing medium of podcasts, as episodes are automatically transcribed the need for good natural language summarization models which can handle a variety of obstacles presented by the transcriptions and the format has increased. This thesis investigates the transformer-based sequence-to-sequence models, where an attention mechanism keeps track of which words in the context are most important to the next word prediction in the sequence. Different summarization models are investigated on a large-scale open-domain podcast dataset which presents challenges such as transcription errors, multiple speakers, different genres, structures, as well as long texts. The results show that a sparse attention mechanism using a sliding window has an increased average ROUGE-2 score F-measure of 21.6% over transformer models using a short input length with fully connected attention layers.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)