Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning

Xueyi Ke, Satoshi Tsutsui, Yayun Zhang, Bihan Wen; Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR), 2025, pp. 4343-4352

Abstract


Infants develop complex visual understanding rapidly, even preceding of the acquisition of linguistic skills. As computer vision seeks to replicate the human vision system, understanding infant visual development may offer valuable insights. In this paper, we present an interdisciplinary study exploring this question: can a computational model that imitates the infant learning process develop broader visual concepts that extend beyond the vocabulary it has heard, similar to how infants naturally learn? To investigate this, we analyze a recently published model in Science by Vong et al., which is trained on longitudinal, egocentric images of a single child paired with transcribed parental speech. We perform neuron labeling to identify visual concept neurons hidden in the model's internal representations. We then demonstrate that these neurons can recognize objects beyond the model's original vocabulary. Furthermore, we compare the differences in representation between infant models and those in modern computer vision models, such as CLIP and ImageNet pre-trained model. Ultimately, our work bridges cognitive science and computer vision by analyzing the internal representations of a computational model trained on an infant visual and linguistic inputs. Our code is available at https://github.com/Kexueyi/discover_infant_vis.

Related Material


[pdf] [supp] [arXiv]
[bibtex]
@InProceedings{Ke_2025_CVPR, author = {Ke, Xueyi and Tsutsui, Satoshi and Zhang, Yayun and Wen, Bihan}, title = {Discovering Hidden Visual Concepts Beyond Linguistic Input in Infant Learning}, booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, month = {June}, year = {2025}, pages = {4343-4352} }