A Multimodal Video and Radar Fusion Framework for High-Accuracy Isolated Sign Language Recognition

Manjur, Sultan Mohammad; Biswas, Sabyasachi; Gurbuz, Ali C.

Sultan Mohammad Manjur, Sabyasachi Biswas, Ali C. Gurbuz; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025, pp. 5061-5070

Abstract

Sign language serves as the primary mode of communication for individuals with auditory and speech impairments. However, for those without such conditions, understanding sign language typically requires specialized training, creating barriers in everyday interactions and critical domains such as healthcare. Automatic sign language recognition (SLR) systems offer the potential to bridge this gap, yet the inherent complexity of sign language--characterized by nuanced hand, finger, and facial gestures--makes this a challenging task. While traditional approaches have relied heavily on RGB video, recent work has explored multimodal systems, combining visual and non-visual data streams such as skeletal dynamics or radar. Each modality brings unique strengths and limitations: for instance, RGB video provides rich visual detail but is vulnerable to lighting conditions and background, while radar is resilient to such conditions but lacks fine-grained appearance information. To leverage the complementary advantages of these modalities, we present a multimodal framework for isolated Italian Sign Language recognition that integrates RGB video, skeletal key-points, and radar-based Range-Doppler Maps (RDMs). We adopt the sing language graph convolution network (SL-GCN) module from prior literature to extract spatiotemporal features from the skeleton modality, while introducing a novel radar processing branch tailored to capture complementary motion information from RDMs. Our approach fuses all three modalities for robust recognition. Evaluated on the 1st Multimodal Isolated Italian Sign Language Recognition Challenge, our model achieves 99.614% accuracy on the validation set and 99.708% on the test set, demonstrating the effectiveness of multimodal fusion for sign language understanding.

Related Material

[pdf]

[bibtex]

@InProceedings{Manjur_2025_ICCV, author = {Manjur, Sultan Mohammad and Biswas, Sabyasachi and Gurbuz, Ali C.}, title = {A Multimodal Video and Radar Fusion Framework for High-Accuracy Isolated Sign Language Recognition}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2025}, pages = {5061-5070} }