Audio-Visual Classification of Sports Types

Rikke Gade, Mohamed Abou-Zleikha, Mads Graesboll Christensen, Thomas B. Moeslund; Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops, 2015, pp. 51-56


In this work we propose a method for classification of sports types from combined audio and visual features extracted from thermal video. From audio Mel Frequency Cepstral Coefficients (MFCC) are extracted, and PCA are applied to reduce the feature space to 10 dimensions. From the visual modality short trajectories are constructed to represent the motion of players. From these, four motion features are extracted and combined directly with audio features for classification. A k-nearest neighbour classifier is applied for classification of 180 1-minute video sequences from three sports types. Using 10-fold cross validation a correct classification rate of 96.11% is obtained with multimodal features, compared to 86.67% and 90.00% using only visual or audio features, respectively.

Related Material

author = {Gade, Rikke and Abou-Zleikha, Mohamed and Graesboll Christensen, Mads and Moeslund, Thomas B.},
title = {Audio-Visual Classification of Sports Types},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {December},
year = {2015}