Spatiotemporal Initialization for 3D CNNs With Generated Motion Patterns

Hirokatsu Kataoka, Kensho Hara, Ryusuke Hayashi, Eisuke Yamagata, Nakamasa Inoue; Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2022, pp. 1279-1288

Abstract


The paper proposes a framework of Formula-Driven Supervised Learning (FDSL) for spatiotemporal initialization. Our FDSL approach enables to automatically and simultaneously generate motion patterns and their video labels with a simple formula which is based on Perlin noise. We designed a dataset of generated motion patterns adequate for the 3D CNNs to learn a better basis set of natural videos. The constructed Video Perlin Noise (VPN) dataset can be applied to initialize a model before pre-training with large-scale video datasets such as Kinetics-400/700, to enhance target task performance. Our spatiotemporal initialization with VPN dataset (VPN initialization) outperforms the previous initialization method with the inflated 3D ConvNet (I3D) using 2D ImageNet dataset. Our proposed method increased the top-1 video-level accuracy of Kinetics-400 pre-trained model on Kinetics-400, UCF-101, HMDB-51, ActivityNet datasets. Especially, the proposed method increased the performance rate of Kinetics-400 pre-trained model by 10.3 pt on ActivityNet. We also report that the relative performance improvements from the baseline are greater in 3D CNNs rather than other models.

Related Material


[pdf]
[bibtex]
@InProceedings{Kataoka_2022_WACV, author = {Kataoka, Hirokatsu and Hara, Kensho and Hayashi, Ryusuke and Yamagata, Eisuke and Inoue, Nakamasa}, title = {Spatiotemporal Initialization for 3D CNNs With Generated Motion Patterns}, booktitle = {Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)}, month = {January}, year = {2022}, pages = {1279-1288} }