RGB Road Scene Material Segmentation

Sudong Cai, Ryosuke Wakaki, Shohei Nobuhara, Ko Nishino; Proceedings of the Asian Conference on Computer Vision (ACCV), 2022, pp. 3051-3067


We address RGB road scene material segmentation, i.e., per-pixel segmentation of materials in real-world driving views with pure RGB images, by building a new tailored benchmark dataset and model for it. Our new dataset, KITTI-Materials, based on the well-established KITTI dataset, consists of 1000 frames covering 24 different road scenes of urban/suburban landscapes, annotated with one of 20 material categories for every pixel in high quality. It is the first dataset tailored to RGB material segmentation in realistic driving scenes which allows us to train and test any RGB material segmentation model. Based on an analysis on KITTI-Materials, we identify the extraction and fusion of texture and context as the key to robust road scene material appearance. We introduce Road scene Material Segmentation Network (RMSNet), a new Transformer-based framework which will serve as a baseline for this challenging task. RMSNet encodes multi-scale hierarchical features with self-attention. We construct the decoder of RMSNet based on a novel lightweight self-attention model, which we refer to as SAMixer. SAMixer achieves adaptive fusion of informative texture and context cues across multiple feature levels. It also significantly accelerates self-attention for feature fusion with a balanced query-key similarity measure. We also introduce a built-in bottleneck of local statistics to achieve further efficiency and accuracy. Extensive experiments on KITTI-Materials validate the effectiveness of our RMSNet. We believe our work lays a solid foundation for further studies on RGB road scene material segmentation.

Related Material

[pdf] [supp] [code]
@InProceedings{Cai_2022_ACCV, author = {Cai, Sudong and Wakaki, Ryosuke and Nobuhara, Shohei and Nishino, Ko}, title = {RGB Road Scene Material Segmentation}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2022}, pages = {3051-3067} }