Half&Half: New Tasks and Benchmarks for Studying Visual Common Sense

Singh, Ashish; Su, Hang; Jin, SouYoung; Jiang, Huaizu; Manjesh, Chetan; Luo, Geng; He, Ziwei; Hong, Li; Learned-Miller, Erik; Cowell, Rosie

Ashish Singh, Hang Su, SouYoung Jin, Huaizu Jiang, Chetan Manjesh, Geng Luo, Ziwei He, Li Hong, Erik Learned-Miller, Rosie Cowell; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2019, pp. 1-4

Abstract

Suppose you're in an unfamiliar apartment looking for a TV. You're more likely to look in a room with a couch than with a sink, or in a room with carpeting than with a tile floor. Making such intelligent decisions about unseen objects given only partial observations is a fundamental component of visual common sense. These capabilities can take various forms and can benefit an intelligent agent in different scenarios. For example, for an agent to find an object efficiently, it must ask questions such as: (1) is the current direction a promising choice? And if not, (2) given observations toward other directions, which one is preferred? A less specific but nonetheless essential capability is to directly predict the next visual observations, which can enable an agent to prepare for imminent encounters. In this work, we formalize three specific prediction tasks critical to visual common sense and introduce benchmarks--the Half&Half benchmarks--to measure an agent's ability to perform these tasks. We show that it is possible to modify pre-existing data sets to develop large training and test set to learn these new tasks with minimal effort. Our trained models exhibit large improvements over naive baselines. Preliminary evaluations on the task on simple visual navigation scenarios demonstrate the utility of our models and the potential power of future intelligent agents equipped with visual common sense.

Related Material

[pdf]

[bibtex]

@InProceedings{Singh_2019_CVPR_Workshops,
author = {Singh, Ashish and Su, Hang and Jin, SouYoung and Jiang, Huaizu and Manjesh, Chetan and Luo, Geng and He, Ziwei and Hong, Li and Learned-Miller, Erik and Cowell, Rosie},
title = {Half&Half: New Tasks and Benchmarks for Studying Visual Common Sense},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops},
month = {June},
year = {2019}
}