Egocentric Biochemical Video-and-Language Dataset

Taichi Nishimura, Kojiro Sakoda, Atsushi Hashimoto, Yoshitaka Ushiku, Natsuko Tanaka, Fumihito Ono, Hirotaka Kameko, Shinsuke Mori; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2021, pp. 3129-3133


This paper proposes a novel biochemical video-and-language (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. The key strength of the dataset is its user-oriented design of data collection. We imagine that biochemical researchers easily take videos and share them for another researcher's replication in the future. To minimize the burden of video recording, we adopted an unedited first-person video as a visual source. As a result, we collected 16 videos from four protocols with a total length of 1.6 hours. In our experiments, we conduct two zero-shot video-and-language tasks on the BioVL dataset. Our experimental results show a large room for improvement for practical use even utilizing the state-of-the-art pre-trained video-and-language joint embedding model. We are going to release the BioVL dataset. To our knowledge, this work is the first attempt to release the biochemical video-and-language dataset.

Related Material

@InProceedings{Nishimura_2021_ICCV, author = {Nishimura, Taichi and Sakoda, Kojiro and Hashimoto, Atsushi and Ushiku, Yoshitaka and Tanaka, Natsuko and Ono, Fumihito and Kameko, Hirotaka and Mori, Shinsuke}, title = {Egocentric Biochemical Video-and-Language Dataset}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2021}, pages = {3129-3133} }