On the Importance of Label Quality for Semantic Segmentation

Aleksandar Zlateski, Ronnachai Jaroensri, Prafull Sharma, Frédo Durand; The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2018, pp. 1479-1487


Convolutional networks (ConvNets) have become the dominant approach to semantic image segmentation. Producing accurate, pixel--level labels required for this task is a tedious and time consuming process; however, producing approximate, coarse labels could take only a fraction of the time and effort. We investigate the relationship between the quality of labels and the performance of ConvNets for semantic segmentation. We create a very large synthetic dataset with perfectly labeled street view scenes. From these perfect labels, we synthetically coarsen labels with different qualities and estimate human--hours required for producing them. We perform a series of experiments by training ConvNets with a varying number of training images and label quality. We found that the performance of ConvNets mostly depends on the time spent creating the training labels. That is, a larger coarsely--annotated dataset can yield the same performance as a smaller finely--annotated one. Furthermore, fine--tuning coarsely pre--trained ConvNets with few finely-annotated labels can yield comparable or superior performance to training it with a large amount of finely-annotated labels alone, at a fraction of the labeling cost. We demonstrate that our result is also valid for different network architectures, and various object classes in an urban scene.

Related Material

author = {Zlateski, Aleksandar and Jaroensri, Ronnachai and Sharma, Prafull and Durand, Frédo},
title = {On the Importance of Label Quality for Semantic Segmentation},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2018}