Unsupervised Audio Spectrogram Compression using Vector Quantized Autoencoders
Abstract: Despite the recent successes of neural networks in a variety of domains, musical audio modeling is still considered a hard task, with features typically spanning tens of thousands of dimensions in input space. By formulating audio data compression as an unsupervised learning task, this project investigates the applicability of vector quantized neural network autoencoders for compressing spectrograms – image-like representations of audio. Using a recently proposed gradient-based method for approximating waveforms from reconstructed (real-valued) spectrograms, the discrete pipeline produces listenable reconstructions of surprising fidelity compared to uncompressed versions, even for out-of-domain examples. The results suggest that the learned discrete quantization method achieves about 9x harder spectrogram compression compared to its continuous counterpart, while achieving similar reconstructions, both qualitatively and in terms of quantitative error metrics.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)