Unsupervised Random Forest Manifold Alignment for Lipreading

Yuru Pei, Tae-Kyun Kim, Hongbin Zha; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2013, pp. 129-136


Lipreading from visual channels remains a challenging topic considering the various speaking characteristics. In this paper, we address an efficient lipreading approach by investigating the unsupervised random forest manifold alignment (RFMA). The density random forest is employed to estimate affinity of patch trajectories in speaking facial videos. We propose novel criteria for node splitting to avoid the rank-deficiency in learning density forests. By virtue of the hierarchical structure of random forests, the trajectory affinities are measured efficiently, which are used to find embeddings of the speaking video clips by a graph-based algorithm. Lipreading is formulated as matching between manifolds of query and reference video clips. We employ the manifold alignment technique for matching, where the L ? norm-based manifold-to-manifold distance is proposed to find the matching pairs. We apply this random forest manifold alignment technique to various video data sets captured by consumer cameras. The experiments demonstrate that lipreading can be performed effectively, and outperform state-of-the-arts.

Related Material

author = {Pei, Yuru and Kim, Tae-Kyun and Zha, Hongbin},
title = {Unsupervised Random Forest Manifold Alignment for Lipreading},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
month = {December},
year = {2013}