A3D: Studying Pretrained Representations With Programmable Datasets

Ye Wang, Norman Mu, Daniele Grandi, Nicolas Savva, Jacob Steinhardt; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 4878-4889

Abstract


Rendered images have been used to debug models, study inductive biases, and understand transfer learning. To scale up rendered datasets, we construct a pipeline with 40 classes of images including furniture and consumer products, backed by 48,716 distinct object models, 480 environments, and 563 materials. We can easily vary dataset diversity along four axes---object diversity, environment, material, and camera angle, making the dataset "programmable". Using this ability, we systematically study how these axes of data characteristics influence pretrained representations. We generate 21 datasets by reducing diversity along different axes, and study performance on five downstream tasks. We find that reducing environment has the biggest impact on performance and is harder to recover after fine-tuning. We corroborate this by visualizing the models' representations, findings that models trained on diverse environments learn more visually meaningful features.

Related Material


[pdf]
[bibtex]
@InProceedings{Wang_2022_CVPR, author = {Wang, Ye and Mu, Norman and Grandi, Daniele and Savva, Nicolas and Steinhardt, Jacob}, title = {A3D: Studying Pretrained Representations With Programmable Datasets}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {4878-4889} }