Advanced search

Showing result 1 - 5 of 11 essays matching the above criteria.

  1. 1. Diffusion-based Vocoding for Real-Time Text-To-Speech

    University essay from Lunds universitet/Matematisk statistik

    Author : Lukas Gardberg; [2023]
    Keywords : Diffusion; Vocoding; Text-to-Speech; Machine Learning; Mathematics and Statistics;

    Abstract : The emergence of machine learning based text-to-speech systems have made fully automated customer service voice calls, spoken personal assistants, and the creation of synthetic voices seem well within reach. However, there are still many technical challenges with creating such a system which can generate audio quickly and of high enough quality. READ MORE

  2. 2. Multi-objective optimization for model selection in music classification

    University essay from KTH/Optimeringslära och systemteori

    Author : Rintaro Ujihara; [2021]
    Keywords : Music emotion recognition; Mel spectrogram; MFCC; CENS; Onset; Tonnetz; HPSS; 1D convolutional neural network; Attention LSTM; 1DCNN BiLSTM; Pareto optimality;

    Abstract : With the breakthrough of machine learning techniques, the research concerning music emotion classification has been getting notable progress combining various audio features and state-of-the-art machine learning models. Still, it is known that the way to preprocess music samples and to choose which machine classification algorithm to use depends on data sets and the objective of each project work. READ MORE

  3. 3. Wavebender GAN : Deep architecture for high-quality and controllable speech synthesis through interpretable features and exchangeable neural synthesizers

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Gustavo Teodoro Döhler Beck; [2021]
    Keywords : Mel-spectrogram; Speech Synthesis; Wavebender GAN; HiFi-GAN; Control- lability; Interpretability; Low-level Signal Properties; Mel-spektrogram; Talsyntes; Wavebender GAN; HiFi-GAN; Kontrollerbarhet; Tolkbarhet; Signalegenskaper På Låg Nivå;

    Abstract : Modeling humans’ speech is a challenging task that originally required a coalition between phoneticians and speech engineers. Yet, the latter, disengaged from phoneticians, have strived for evermore natural speech synthesis in the absence of an awareness of speech modelling due to data- driven and ever-growing deep learning models. READ MORE

  4. 4. Noisy recognition of perceptual mid-level features in music

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Simon Mossmyr; [2021]
    Keywords : ;

    Abstract : Self-training with noisy student is a consistency-based semi-supervised self- training method that achieved state-of-the-art accuracy on ImageNet image classification upon its release. It makes use of data noise and model noise when fitting a model to both labelled data and a large amount of artificially labelled data. READ MORE

  5. 5. Evaluation of transferability of Convolutional Neural Network pre-training with regard to image characteristics

    University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

    Author : Fanny Ekblad Voltaire; Noah Mannberg; [2021]
    Keywords : ;

    Abstract : This study evaluates the impact of pre-training on a medical classification task and investigates what characteristics if images affect the transferability of learned features from the pre-training. Cardiotocography (CTG) is a combined electronic measurement of the fetal heart rate (FHR) and maternal uterine contractions during labor and delivery and is commonly analyzed to prevent hypoxia. READ MORE