Progressive Knowledge-Embedded Unified Perceptual Parsing for Scene Understanding
Human can naturally understand scenes in depth with the help of various knowledge accumulated and by a comprehensive visual concept organization including category labels and different-level attributes. This inspires us to unify professional knowledge at different levels with deep neural network architectures progressively for scene understanding. Different from the general embedding approaches, we construct different knowledge graphs for different levels of vision tasks by organizing the rich visual concepts accordingly. We employ a gated graph neural network and relational graph convolutional networks to propagate node messages for different levels of tasks and generate progressively different levels of knowledge representation through the graph. Compared with existing methods, our framework has a main appealing property leading to a novel progressive knowledge-embedded representation learning framework that incorporates different level knowledge graphs into the learning of networks at corresponding level. Extensive experiments on the widely used Broden+ dataset demonstrate the superiority of the proposed framework over other existing state-of-the-art methods.