A Study of Accumulation Times in Translation from Event Streams to Video for the Purpose of Lip Reading

University essay from KTH/Datavetenskap

Abstract: Visually extracting textual context from lips consists of pattern matching which results in a frequent use of machine learning approaches for the task of classification. Previous research has consisted of mostly audiovisual (multi modal) approaches and conventional cameras. This study isolates the visual medium and uses event-based cameras instead of conventional cameras. Classifying visual features is computationally expensive and the minimisation of excessive data can be of importance for performance which motivates the usage of event cameras. Event cameras are inspired by the biological vision and only capture changes in the scene while offering high temporal resolution (corresponding to frame rate for conventional cameras). This study investigates the importance of temporal resolution for the task of lip reading by modifying the ∆time used for collecting events. No correlation could be observed within the collected data set. The paper is not able to come to any conclusions regarding suitability of the chosen approach for the particular application. There are multiple other variables that could effect the results which makes it hard to dismiss the technology’s potential within the domain.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)