ARMAS: Active Reconstruction of Missing Audio Segments

University essay from Blekinge Tekniska Högskola/Institutionen för datavetenskap

Abstract: Background: Audio signal reconstruction using machine/deep learning algorithms has been explored much more in the recent years, and it has many applications in digital signal processing. There are many research works on audio reconstruction with linear interpolation, phase coding, tone insertion techniques combined with AI models. However, there is no research work on reconstructing audio signals with the fusion of Steganoflage (an adaptive approach to image steganography)  and AI models. Thus, in our thesis work, we focus on audio reconstruction combining Steganoflage and AI models. Objectives: This thesis aims to explore the possible enhancement of audio reconstruction using machine/deep learning models fusing Steganoflage technique. Furthermore, the suitable models implemented with the fusion of Steganoflage are analyzed and compared based on the performance metrics. Methods: We have conducted a systematic literature review followed by an experiment method to answer our research questions. The models implemented in the thesis are the results from a systematic literature review (SLR). In the experiments, we have fused the RF (Random Forest), SVR (Support Vector Regression), and LSTM (Long Short-Term Memory) models with Steganoflage for possible enhancement of reconstruction of lost audio signals. Then, the models were trained to estimate the possible approximate reconstructed signals. Finally, we observed the performance of the models and compared the reconstructed audio signals with the original signals (ground-truth) with four different performance metrics: Pearson linear correlation, PSNR, WPSNR, and SSIM. Results: The results from the SLR show that for machine learning models, RF and SVR models were mainly used for signals reconstructions and works well with time-series data. For deep learning models, recurrent neural network LSTM was the first choice as the survey of literature demonstrated that the model is suitable for time series forecasting. From the experiments, we found that the performance of LSTM model was better than RF and SVR models. Moreover, the reconstruction of audio signals from dropped short single region was better than that for multiple regions. Conclusions: We conclude that the Steganoflage, when fused with machine/deep learning models, enhances the lost audio signal reconstruction. Moreover, we also conclude that the LSTM model is more accurate than RF and SVR models in reconstructing the lost audio signals for a single drop region on both short and long gaps. However, we also observed that the audio reconstruction for multiple drops needs improvements considering long gaps. Furthermore, improvements can be made by exploring newer AI methods/optimization to enhance the reconstructed audio signals.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)