End-to-End Neuromorphic Lip-Reading

Hugo Bulzomi, Marcel Schweiker, Amélie Gruel, Jean Martinet; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2023, pp. 4101-4108

Abstract


Human speech perception is intrinsically a multi-modal task since speech production requires the speaker to move the lips, producing visual cues in addition to auditory information. Lip reading consists in visually interpreting the movements of the lips to understand speech, without the use of sound. It is an important task since it can either complement an audio-based speech recognition system or replace it when sound is not available. We introduce in this paper a neuromorphic model for lip reading, that uses events produced by an event-based sensor capturing lips motion as input, and that classifies short event sequences in word categories based on a SNN architecture. Experimental results show that the proposed model successfully leverages various advantages of neuromorphic approaches such as energy efficiency and low latency, which are central features in real-time embedded scenarios. To the best of our knowledge, it is the first proposal of an end-to-end neuromorphic lip reading model.

Related Material


[pdf]
[bibtex]
@InProceedings{Bulzomi_2023_CVPR, author = {Bulzomi, Hugo and Schweiker, Marcel and Gruel, Am\'elie and Martinet, Jean}, title = {End-to-End Neuromorphic Lip-Reading}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2023}, pages = {4101-4108} }