Pro-CCaps: Progressively Teaching Colourisation to Capsules
Automatic image colourisation studies how to colourise greyscale images. Existing approaches exploit convolutional layers that extract image-level features learning the colourisation on the entire image, but miss entities-level ones due to pooling strategies. We believe that entity-level features are of paramount importance to deal with the intrinsic multimodality of the problem (i.e., the same object can have different colours, and the same colour can have different properties). Models based on capsule layers aim to identify entity-level features in the image from different points of view, but they do not keep track of global features. Our network architecture integrates entity-level features into the image-level features to generate a plausible image colourisation. We observed that results obtained with direct integration of such two representations are largely dominated by the image-level features, thus resulting in unsaturated colours for the entities. To limit such an issue, we propose a gradual growth of the reconstruction phase of the model while training. By advantaging of prior knowledge from each growing step, we obtain a stable collaboration between image-level and entity-level features that ultimately generates stable and vibrant colourisations. Experimental results on three benchmark datasets, and a user study, demonstrate that our approach has competitive performance with respect to the state-of-the-art and provides more consistent colourisation. Code available at omitted-for-reviewing-purposes.