HARD : Hardware-Aware lightweight Real-time semantic segmentation model Deployable from Edge to GPU

YoungWook Kwon, WanSoo Kim, HyunJin Kim; Proceedings of the Asian Conference on Computer Vision (ACCV), 2024, pp. 3552-3569

Abstract


The two-branch model ensures high performance in semantic segmentation. However, the additional branch causes the fusion between high-resolution and low-resolution contexts to corrupt the surrounding context and increases the computational overhead. Existing methods with many parameters and high computational costs are not well-suited for the low-power devices used in applications like autonomous driving and robotics. This study proposes a robust semantic segmentation architecture with any kind of device, from GPUs to edge devices. We introduce five variations called HARD. HARD achieves fast inference speeds while maintaining good performance on any kind of device. Notably, the proposed Dual Atrous Pooling Module (DAP) can effectively fuse contexts of variable resolutions without decreasing inference speed. Besides, a lightweight decoder named Serialized Atrous Module (SA) is proposed to extract global context. The proposed models are evaluated on both GPU and embedded computing devices from NVIDIA and ARM Cortex-M CPU. In experiments on Cityscapes, CamVid, and COCO-Stuff datasets, the proposed variations of HARDs achieve 73.8, 76.3, and 41.0 mIoU, which outperform existing SOTA models.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{Kwon_2024_ACCV, author = {Kwon, YoungWook and Kim, WanSoo and Kim, HyunJin}, title = {HARD : Hardware-Aware lightweight Real-time semantic segmentation model Deployable from Edge to GPU}, booktitle = {Proceedings of the Asian Conference on Computer Vision (ACCV)}, month = {December}, year = {2024}, pages = {3552-3569} }