Lightweight Remote Collaboration System based on WebRTC : Improving Remote Collaboration Flexibility

University essay from Blekinge Tekniska Högskola/Institutionen för kommunikationssystem


Context. Introduction of efficient multimedia technologies combined with the spreading of high-speed internet connection all over the world has led to the continuous increase in demand of multimedia services, particularly video and audio. One of the major demands are flexible, interoperable and cost-effective lightweight remote collaboration systems in companies. Web Real Time Communication (WebRTC) is an emerging peer to peer technology that is promising to be the solution to many digital real-time communication challenges. With its fantastic one-to-one communication capabilities, WebRTC supports fast and smooth audio calls, video calls, conferencing, data (media file, document and screen) sharing, gaming and all sorts of messages exchange, all being done straight out of the browser. However, as shown by investigations and interviews supported by Ericsson AB and Semcon AB as party of the MERCO (Mediated Effective Remote Collaboration) international project, many corporate use cases of remote collaboration involve applications beyond the conventional one to one communication. Present videoconferencing systems (telepresence) limits the collaboration flexibility due to their lack of the ability to adapt to system resource usage, hence tend to be too heavy for less powerful devices (laptops, tablets, phones). Moreover, their installation and maintenance costs are too expensive for small companies.  Therefore, new flexible, lightweight and less expensive solutions for remote collaboration need to be developed.

Objectives. The main objective of this thesis is to identify technical solutions to address the challenges of resource usage flexibility in WebRTC multi-party remote collaboration systems. Despite concurrent developments of both commercial and free solutions that provide multi-party videoconferencing services using WebRTC, present solutions such as the conventional Multipoint Control Unit (MCU), Selective Forwarding Unit (SFU) and Fully Meshed architectures suffers from issues of excessive resource usage and cannot deliver the acceptable quality of experience in different use cases, particularly the mobile environment. The aim of this thesis is to investigate lightweight technical solutions that can be used to improve the system resource usage in WebRTC multiparty conferencing systems. Through understanding the architectural designs, benchmarking the performance of various technologies used in WebRTC and selecting the most suitable techniques a prototype is developed as a proof of concept.

Methods. The first part of the thesis is dedicated to comprehensive study of fundamentals, background information and related works on WebRTC. This gives knowledge of technologies, techniques and performance evaluation metrics which help in making appropriate technical decisions during the experimental development of WebRTC solutions. The second part of the thesis is dedicated to experimental investigation in which two WebRTC signaling technologies (XSockets and NodeJs) are evaluated based on call setup time in WebRTC group call. Two lightweight technical solutions for improving resource usage flexibility (Switching video quality based on speech and using emotions and gestures instead of video) are evaluated based on system resources (CPU, memory, disk and network) and user experience.

Results. Based on call setup time of WebRTC multi-party calls, the experimental results indicates that XSockets is a better signaling technology than NodeJs. The two proposed lightweight solutions have shown a remarkable improvement based on systems resource usage. A 15% reduction of CPU usage is observed when using speech controlled video quality switching and further 10% reduction is observed when video is replaced by emotions and gestures.

Conclusions. Despite the minimal resource usage achieved by using emotions technique, this solution has usability issues as it cannot detect emotions in poor lighting environment. Consequently, the solution of switching video quality based on speech is chosen for further implementation. Though, this technique can be further improved through using machine learning techniques, the current implementation can significantly reduce the amount CPU, memory, disk and network usage to allow up to 6 participants to join a single conference call while maintain acceptable quality of experience.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)