Video-based Object Recognition using Novel Set-of-Sets Representations

Yang Liu, Youngkyoon Jang, Woontack Woo, Tae-Kyun Kim; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2014, pp. 519-526


We address the problem of object recognition in egocentric videos, where a user arbitrarily moves a mobile camera around an unknown object. Using a video that captures variation in an object's appearance owing to camera motion (more viewpoints, scales, clutter and lighting conditions), can accumulate evidence and improve object recognition accuracy. Most previous work has taken a single image as input, or tackled a video simply by a collection i.e. sum of frame-based recognition scores. In this paper, beyond frame-based recognition, we propose two novel set-of-sets representations of a video sequence for object recognition. We combine the techniques of bag of words for a set of data spatially distributed thus heterogeneous, and manifold for a set of data temporally smooth and homogeneous, to construct the two proposed set-of-sets representations. We also propose methods to perform matching for the two representations respectively. The representations and matching techniques are evaluated on our video-based object recognition datasets, which contain 830 videos of ten objects and four environmental variations. The experiments on the challenging new datasets show that our proposed solution significantly outperforms the traditional frame-based methods.

Related Material

author = {Liu, Yang and Jang, Youngkyoon and Woo, Woontack and Kim, Tae-Kyun},
title = {Video-based Object Recognition using Novel Set-of-Sets Representations},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2014}