PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors

Haley M. So, Laurie Bose, Piotr Dudek, Gordon Wetzstein; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 25233-25244

Abstract


Conventional image sensors digitize high-resolution images at fast frame rates producing a large amount of data that needs to be transmitted off the sensor for further processing. This is challenging for perception systems operating on edge devices because communication is power inefficient and induces latency. Fueled by innovations in stacked image sensor fabrication emerging sensor--processors offer programmability and processing capabilities directly on the sensor. We exploit these capabilities by developing an efficient recurrent neural network architecture PixelRNN that encodes spatio-temporal features on the sensor using purely binary operations. PixelRNN reduces the amount of data to be transmitted off the sensor by factors up to 256 compared to the raw sensor data while offering competitive accuracy for hand gesture recognition and lip reading tasks. We experimentally validate PixelRNN using a prototype implementation on the SCAMP-5 sensor--processor platform.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{So_2024_CVPR, author = {So, Haley M. and Bose, Laurie and Dudek, Piotr and Wetzstein, Gordon}, title = {PixelRNN: In-pixel Recurrent Neural Networks for End-to-end-optimized Perception with Neural Sensors}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2024}, pages = {25233-25244} }