Beyond Deep Feature Averaging: Sampling Videos Towards Practical Facial Pain Recognition

Xiang Xiang; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 37-42

Abstract


In hospitals, automatic identification of patients with cameras can greatly generalize the applicability of intelligent patient monitoring. However, patients unaware of being monitored do not adjust their behaviors, making pose variation a challenge. We argue that the frame-wise feature mean is unable to characterize the variation among frames. We propose to preserve the overall pose diversity if we want the video feature to represent the subject identity. Then identity will be the only source of variation across videos since pose varies even within a single video. Following that variation disentanglement idea, we present a pose-robust face verification algorithm with each video represented as an ensemble of frame-wise CNN features. Another challenge is that patients may move anytime, which makes real-time processing of a video stream a necessity. Instead of simply using all the frames, the algorithm is highlighted at the key frame selection by pose quantization using pose distances to K-means centroids, which reduces the number of feature vectors from hundreds to K while still preserving the overall diversity. We analyze how such a video sampling strategy is better than random sampling. An end-to-end face recognition algorithm is developed for real-time patient identification with a rank-list of one-to-one similarities using the proposed video representation. It works well in practice and generates a private patient dataset on the fly. On the official 5000 video-pairs of public YouTube Face dataset, our algorithm achieves a comparable performance with state-of-the-art that averages over deep features of all frames. In summary, the main contribution of this paper is a video-versus-video consensus with discriminative metric learning on the fly, which is verified in a working system for the patient monitoring system.

Related Material


[pdf]
[bibtex]
@InProceedings{Xiang_2019_CVPR_Workshops,
author = {Xiang, Xiang},
title = {Beyond Deep Feature Averaging: Sampling Videos Towards Practical Facial Pain Recognition},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}