SMDAF: A Scalable Sidewalk Material Data Acquisition Framework with Bidirectional Cross-Modal Knowledge Distillation

Liu, Jiawei; Lam, Wayne; Zhu, Zhigang; Tang, Hao

Jiawei Liu, Wayne Lam, Zhigang Zhu, Hao Tang; Proceedings of the Winter Conference on Applications of Computer Vision (WACV), 2025, pp. 2983-2992

Abstract

Ensuring safe and independent navigation poses considerable difficulties for individuals who are blind or have low vision (BLV) as it requires detailed knowledge of their immediate environment. Our research highlights the critical need for accessible data on sidewalk materials and objects which is currently lacking in existing map services. To bridge this gap we present the Sidewalk Material Data Acquisition Framework (SMDAF) designed for large-scale data collection. This framework includes (1) a lightweight data collection system embedded in a white cane which captures audio data through the interaction of the cane tip with the sidewalk surface and a mobile app that facilitates data storage and management resulting in a novel multimodal dataset comprising both image and audio data; and (2) a unique Cross-Modal Knowledge Distillation (CMKD) technique for an enhanced audio material classifier. Our CMKD approach employs an image-based model as the teacher to improve the audio model incorporating an Enhanced Bidirectional learning method with an intuitive filtering technique: Bidirectional Correct Sample Filtering (BCSF). BCSF filters correct samples to prevent the distillation of incorrect knowledge addressing the issue of inaccurate cross-modal learning. This novel approach has resulted in a 1.84% improvement in Macro Accuracy achieving an overall accuracy of 87.62% surpassing all state-of-the-art KD and CMKD methods. This study underscores the efficacy of SMDAF and provides a practical CMKD technique for future cross-modal learning tasks. Code and dataset are available https://github.com/FgSurewin/SMDAF-CMKD.

Related Material

[pdf]

[bibtex]

@InProceedings{Liu_2025_WACV, author = {Liu, Jiawei and Lam, Wayne and Zhu, Zhigang and Tang, Hao}, title = {SMDAF: A Scalable Sidewalk Material Data Acquisition Framework with Bidirectional Cross-Modal Knowledge Distillation}, booktitle = {Proceedings of the Winter Conference on Applications of Computer Vision (WACV)}, month = {February}, year = {2025}, pages = {2983-2992} }