Context Embedding Networks

Kun Ho Kim, Oisin Mac Aodha, Pietro Perona; Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 8679-8687


Low dimensional embeddings that capture the main variations of interest in collections of data are important for many applications. One way to construct these embeddings is to acquire estimates of similarity from the crowd. Similarity is a multi-dimensional concept that varies from individual to individual. However, existing models for learning crowd embeddings typically make simplifying assumptions such as all individuals estimate similarity using the same criteria, the list of criteria is known in advance, or that the crowd workers are not influenced by the data that they see. To overcome these limitations we introduce Context Embedding Networks (CENs). In addition to learning interpretable embeddings from images, CENs also model worker biases for different attributes along with the visual context i.e. the attributes highlighted by a set of images. Experiments on three noisy crowd annotated datasets show that modeling both worker bias and visual context results in more interpretable embeddings compared to existing approaches.

Related Material

[pdf] [supp] [arXiv] [video]
author = {Kim, Kun Ho and Mac Aodha, Oisin and Perona, Pietro},
title = {Context Embedding Networks},
booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}