Recognizing Semantics in Human Actions with Object Detection

University essay from KTH/Skolan för datavetenskap och kommunikation (CSC)

Abstract: Two-stream convolutional neural networks are currently one of the most successful approaches for human action recognition. The two-stream convolutional networks separates spatial and temporal information into a spatial stream and a temporal stream. The spatial stream accepts a single RGB frame, while the temporal stream accepts a sequence of optical flow. There have been attempts to further extend the work of the two-stream convolutional network framework. For instance there have been attempts to extend with a third network for auxiliary information, which this thesis mainly focuses on. We seek to extend the two-stream convolutional neural network by introducing a semantic stream by using object detection systems. Two contributions are made in thesis: First we show that this semantic stream can provide slight improvements over two-stream convolutional neural networks for human action recognition on standard benchmarks. Secondly, we attempt to seek divergence enhancements techniques to force our new semantic stream to complement the spatial and the temporal streams by modifying the loss function during training. Slight gains are seen using these divergence enhancement techniques.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)