Automatic Podcast Chapter Segmentation : A Framework for Implementing and Evaluating Chapter Boundary Models for Transcribed Audio Documents

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Podcasts are an exponentially growing audio medium where useful and relevant content should be served, which requires new methods of information sorting. This thesis is the first to look into the state-of-art problem of segmenting podcasts into chapters (structurally and topically coherent sections). Podcast segmentation is a more difficult problem than segmenting structured text due to spontaneous speech and transcription errors from automatic speech recognition systems. This thesis used author-provided timestamps from podcast descriptions as labels to perform supervised learning. Binary classification is performed on sentences from podcast transcripts. A general framework is delivered for creating a dataset with 21 436 podcast episodes, training a supervised model, and for evaluation. The framework managed to address technical challenges such as a high data imbalance (there are few chapter transitions per episode), and finding an appropriate context size (how many sentences are shown to the model during inference). The proposed model outperformed a baseline model in quantitative metrics and in a human evaluation with 100 transitions. The solution provided in this thesis can be used to chapterize podcasts, which has many downstream applications, such as segment sorting, summarization, and information retrieval.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)