-
[pdf]
[supp]
[bibtex]@InProceedings{Lee_2026_CVPR, author = {Lee, Seyeon and Ye, Juncheol and Kim, Jaehong and Han, Dongsu}, title = {Neural-Centric Video Processing Pipeline for Unified Multi-Task Inference}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)}, month = {June}, year = {2026}, pages = {18555-18564} }
Neural-Centric Video Processing Pipeline for Unified Multi-Task Inference
Abstract
Videos are increasingly used as inputs to machine learning systems, where repeated decoding and processing across diverse downstream tasks dominate computational cost. However, existing video pipelines remain inefficient. Traditional codecs such as H.264 and H.265 are optimized for human perception and require full pixel decoding for every query, compressed-domain methods are tied to specific codec structures with limited flexibility, and machine-oriented video coding approaches often rely on task-specific encoders and separate representations without supporting human visualization. We propose Neural Video Pipeline (NVP), a framework that leverages implicit neural representations to directly extract task-specific features from intermediate layers, eliminating pixel reconstruction overhead. NVP introduces lightweight micro adapters that map these features into the representation space of downstream models, bypassing both decoding and early-stage feature extraction. Across four representative tasks--image classification, object detection, action recognition, and segmentation--NVP reduces latency by up to 89.5% and inference FLOPs by up to 29.9%, while supporting multiple tasks using a single unified representation.
Related Material

