Implementation and Evaluation of Encoder Tools for Multi-Channel Audio

University essay from Luleå tekniska universitet/Institutionen för system- och rymdteknik

Abstract: The increasing interest for immersive experiences in areas such as augmented and virtual reality makes high quality 3D sound more important than ever before. A technique for capturing and rendering 3D audio which has received more attention during the last twenty years are Higher Order Ambisonics (HOA). Higher Order Ambisonics is a scene based audio format which has a lot of advantages compared to other standard formats. Hovever, one problem with HOA is that it requires a lot of bandwidth. For example, sending an uncoded high quality HOA signal requires 49 channels to be transmitted at the same time which requires a bandwidth of about 40 Mbps. A lot of effort has been made in the last ten years on coding HOA signals. In this thesis, two different approaches are taken on coding HOA signals. In one approach, called Sound Field Rotation (SFR) in this thesis, the microphone that records the sound field is virtually rotated to see if it is possible to make some of the channels zero. The second approach, called Sound Field Decomposition (SFD) in this thesis, use Principal component analysis to decompose a sound field into a foreground and background component. The Sound Field Decomposition approach is inspired by the emerging MPEG-H 3D Audio standard for coding HOA signals. The result shows that the Sound Field Rotation method only works for very simple sound scenes. It has also been shown that a 49 channels HOA signal can be reduced to as little as 7 channels if the sound scene consists of a point source. The Sound Field Deomposition method worked for more complex sound scenes. It was shown that a MPEG similar system could be improved. Result from MUSHRA (Multiple stimuli with hidden reference and anchor) listening tests showed that an improved MPEG similar system reached a MUSHRA score about 78 while the MPEG similar system reached 55 at a bitrate of 256 kbps. Without coding each monochannels with the 3GPP EVS (Enhanced voice services) codec, the improved MPEG similar system reached the MUSHRA score 85. At 256 kbps, the improved MPEG similar system coded the HOA signal into six channels instead of 49 for the uncoded signal. From objective results, it was shown that the improved MPEG similar system had largest effect at low bitrates.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)