Learning Contextual Causality Between Daily Events From Time-Consecutive Images
Conventional textual-based causal knowledge acquisition methods typically require laborious and expensive human annotations. As a result, their scale is often limited. Moreover, as no context is provided during the annotation, the resulting causal knowledge records (e.g., ConceptNet) typically do not consider the context. In this paper, we move out of the textual domain to explore a more scalable way of acquiring causal knowledge and investigate the possibility of learning contextual causality from the visual signal. Specifically, we first propose a high-quality dataset Vis-Causal and then conduct experiments to demonstrate that with good language and visual representations, it is possible to discover meaningful causal knowledge from the videos. Further analysis also shows that the contextual property of causal relations indeed exists and considering the contextual property can help better predict the causal relation between events. The Vis-Causal dataset and experiment code are available at https://github.com/HKUST-KnowComp/Vis_Causal.