-
[pdf]
[supp]
[bibtex]@InProceedings{Huang_2025_CVPR, author = {Huang, Zhe and Zhao, Yizhe and Xiao, Hao and Wu, Chenyan and Ge, Lingting}, title = {DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR) Workshops}, month = {June}, year = {2025}, pages = {2560-2570} }
DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection
Abstract
Multi-view camera-only 3D object detection largely follows two primary paradigms: exploiting bird's-eye-view (BEV) representations or focusing on perspective-view (PV) features, each with distinct advantages. Although several recent approaches explore combining BEV and PV, many rely on partial fusion or maintain separate detection heads. In this paper, we propose DuoSpaceNet, a novel framework that fully unifies BEV and PV feature spaces within a single detection pipeline for comprehensive 3D perception. Our design includes a decoder to integrate BEV-PV features into unified detection queries, as well as a feature enhancement strategy that enriches different feature representations. In addition, DuoSpaceNet can be extended to handle multi-frame inputs, enabling more robust temporal analysis. Extensive experiments on the nuScenes dataset show that DuoSpaceNet surpasses both BEV-based baselines (e.g., BEVFormer) and PV-based baselines (e.g., Sparse4D) in 3D object detection and BEV map segmentation, verifying the effectiveness of our proposed design.
Related Material