Stable diffusion for HRIR extrapolation : A novel approach with deep learning

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Humans perceive and interact with their environment through a multitude of sensory channels. Among these, hearing plays a pivotal role, enabling humans to effectively navigate their surroundings. Sound localization, a complex process, relies on the ability of the human brain to distinguish subtle differences between propagated sound waves interacted with the subject's anthropometric features. However, when utilizing headphones in virtual environments, this natural interaction between sound waves and the human subject is altered. To replicate this phenomenon, the acquisition of head-related filters (HR filters) is necessary to transform non-spatial audio into its spatial representation. Unfortunately, the recording process of HR filters is arduous and resource-intensive, resulting in spatial gaps within datasets, particularly in regions above and below the subject, which are more challenging to capture. To address these incomplete HR filters, extrapolation methods must be employed. While distance extrapolation has been previously explored, research on extrapolation techniques for HR filters remains scarce. Hence, this study introduces a novel approach utilizing a pre-trained deep learning model known as Stable Diffusion to efficiently train the model. The results of this innovative technique showcase a remarkable level of precision and fidelity in the extrapolation of head-related filters (HR filters) for both high and low elevations for virtual auditory environments. Through the utilization of the proposed approach, HR filters are successfully extended beyond their original recording boundaries, allowing for an enhanced spatial representation of sound sources situated at varying heights. The extrapolation process not only achieves high levels of accuracy but also ensures the preservation of intricate spatial details, enabling a more immersive and realistic auditory experience for users. These findings signify a significant advancement in the field of virtual acoustics and hold substantial implications for applications such as virtual reality, gaming, and audio engineering.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)