Optimizing web camera based eye tracking system : An investigating of the effect of network pruning and image resolution

University essay from KTH/Skolan för elektroteknik och datavetenskap (EECS)

Abstract: Deep learning has opened new doors to things that were only imaginable before. When it comes to eye tracking, the advances in deep learning have made it possible to predict gaze using the integrated camera that most mobile and desktop devices have nowadays. This has enabled the technique to move from needing advanced eye tracking equipment to being available to everyone with mobile and desktop devices. To make a more accurate gaze prediction more advanced neural network is needed and more computational power. This study investigates how a convolutional neural network used for eye tracking using a desktop web camera could be optimized in terms of computational cost while not compromising the accuracy of the network. In this work, two different methods to decrease the computational cost are investigated and evaluated how it impacts the accuracy, namely pruning and reducing the input image resolution fed to the convolutional neural network. Pruning is when weights in a neural network are removed to make the network sparser. The result shows that pruning works for regression tasks like eye tracking using a desktop web camera without compromising accuracy. When the convolutional neural network is pruned to 80% of its original weights in the convolutional layers, the accuracy improves by 6.8% compared to the same network that has not been pruned. The result also shows that reducing the number of pixels in the input images also improves the accuracy of the neural network. This is investigated further and by injecting noise into the input images used for testing, which shown that the networked trained with a lower resolution image for the face input is more robust to noise than the baseline model. This could be one explanation for the improvement when the face image is downsampled to a lower resolution. It is also shown that a model trained with reduced face and eyes input by a factor of four decreases its computational time by 85.7% compared to a baseline model. 

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)