Using Quantization and Serialization to Improve AI Super-Resolution Inference Time on Cloud Platform

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Author: Wai-hong Anton Fu; [2023]

Keywords: ;

Abstract: AI Super-Resolution is a branch of Artificial Intelligence where the goal is to take a low-resolution image and upscale it into a high-resolution image. These models are usually deep learning models based on Convolutional Neural Networks (CNN) and/or transformers. Model compression techniques aim to simplify a model by decreasing its size and/or inference time without any significant loss in performance. Such techniques include quantization and serialization. This thesis is conducted in cooperation with Blankt, an online design engine company. The aim is to find a suitable AI Super-Resolution model which yields a good upscaling image quality with efficient and predictable inference time. The evaluation takes into account factors such as image size and hardware configuration on Amazon Web Services (AWS), a popular cloud computing platform. The image upscaling quality is also evaluated, both quantitatively and qualitatively. While numerous research papers have made significant contributions to enhancing the performance of AI Super-Resolution although there has been limited exploration concerning the inference time aspect, particularly when the model is deployed in real-world settings such as cloud platforms. In this study SWIN-IR is selected, a state-of-art deep-learning model for AI Super-Resolution, and compression techniques such as serialization and quantization are implemented to deploy the models x2-, x3-, and x4 upscaling on the AWS cloud platform. We measured the inference time and model loading latency for various image types as part of our investigation. The inference time of the models exhibited a linear relationship with the input image size, enabling inference time prediction. Furthermore, the result showed upscaling images are of good quality, although the models lack generalization for certain types of shapes. As for the model compression, the results showed that neither serialization nor quantization affected the image quality negatively, however they did not either show any substantial improvements in the inference time significantly.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)