MFAS: Multimodal Fusion Architecture Search

Juan-Manuel Perez-Rua, Valentin Vielzeuf, Stephane Pateux, Moez Baccouche, Frederic Jurie; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019, pp. 6966-6975


We tackle the problem of finding good architectures for multimodal classification problems. We propose a novel and generic search space that spans a large number of possible fusion architectures. In order to find an optimal architecture for a given dataset in the proposed search space, we leverage an efficient sequential model-based exploration approach that is tailored for the problem. We demonstrate the value of posing multimodal fusion as a neural architecture search problem by extensive experimentation on a toy dataset and two other real multimodal datasets. We discover fusion architectures that exhibit state-of-the-art performance for problems with different domain and dataset size, including the \ntu dataset, the largest multimodal action recognition dataset available.

Related Material

author = {Perez-Rua, Juan-Manuel and Vielzeuf, Valentin and Pateux, Stephane and Baccouche, Moez and Jurie, Frederic},
title = {MFAS: Multimodal Fusion Architecture Search},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2019}