Essays about: "audio segmentation"

Showing result 1 - 5 of 12 essays containing the words audio segmentation.

  1. 1. Analysis of speaking time and content of the various debates of the presidential campaign : Automated AI analysis of speech time and content of presidential debates based on the audio using speaker detection and topic detection

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Axel Valentin Maza; [2023]
    Keywords : Artificial Intelligence; Speaker detection; Speaker recognition; Speaker diarization; Speaker identification; Debate; Politics; Deep Learning; Artificiell intelligens; talardetektion; talarigenkänning; talardiarisering; talaridentifiering; debatt; politik; djupinlärning;

    Abstract : The field of artificial intelligence (AI) has grown rapidly in recent years and its applications are becoming more widespread in various fields, including politics. In particular, presidential debates have become a crucial aspect of election campaigns and it is important to analyze the information exchanged in these debates in an objective way to let voters choose without being influenced by biased data. READ MORE

  2. 2. Swedish Language End-to-End Automatic Speech Recognition for Media Monitoring using Deep Learning

    University essay from Luleå tekniska universitet/Institutionen för system- och rymdteknik

    Author : Hector Nyblom; [2022]
    Keywords : Automatic Speech Recognition; Deep Learning; Machine Learning; Natural Language Processing; Media Monitoring;

    Abstract : In order to extract relevant information from speech recordings, the general approach is to first convert the audio into transcribed text. The text can then be analysed using well researched methods. NewsMachine AB provides customers with an overview of how they are represented in media by analysing articles in text form. READ MORE

  3. 3. Automatic Podcast Chapter Segmentation : A Framework for Implementing and Evaluating Chapter Boundary Models for Transcribed Audio Documents

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Adam Feldstein Jacobs; [2022]
    Keywords : Machine Learning; Natural Language Processing; Speech Technology; Deep Learning; Podcast Segmentation; Maskininlärning; Språkteknologi; Djupinlärning; Podcast Segmentation;

    Abstract : Podcasts are an exponentially growing audio medium where useful and relevant content should be served, which requires new methods of information sorting. This thesis is the first to look into the state-of-art problem of segmenting podcasts into chapters (structurally and topically coherent sections). READ MORE

  4. 4. Speaker Diarization System for Call-center data

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Yi Li; [2020]
    Keywords : MFCC-vector Speaker Diarization; Speaker Verification; Voice Active Detection; Gaussian Mixture Model; Hierarchy Clustering; MFCC-vektor Högtalardarisering; Högtalarverifiering; Röstaktiv detektering; Gaussisk blandningsmodell; Hierarkikluster;

    Abstract : To answer the question who spoke when, speaker diarization (SD) is a critical step for many speech applications in practice. The task of our project is building a MFCC-vector based speaker diarization system on top of a speaker verification system (SV), which is an existing Call-centers application to check the customer’s identity from a phone call. READ MORE

  5. 5. Text and Speech Alignment Methods for Speech Translation Corpora Creation : Augmenting English LibriVox Recordings with Italian Textual Translations

    University essay from Uppsala universitet/Institutionen för lingvistik och filologi

    Author : Giuseppe Della Corte; [2020]
    Keywords : speech translation; parallel corpora; bilingual sentence alignment; sentence embeddings; cosine similarity; forced alignment; text collection; corpora creation; audio signal processing;

    Abstract : The recent uprise of end-to-end speech translation models requires a new generation of parallel corpora, composed of a large amount of source language speech utterances aligned with their target language textual translations. We hereby show a pipeline and a set of methods to collect hundreds of hours of English audio-book recordings and align them with their Italian textual translations, using exclusively public domain resources gathered semi-automatically from the web. READ MORE