Part-Aligned Bilinear Representations for Person Re-Identification

Yumin Suh, Jingdong Wang, Siyu Tang, Tao Mei, Kyoung Mu Lee; Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 402-419

Abstract


Comparing the appearance of corresponding body parts is essential for person re-identification. As body parts are frequently misaligned between the detected human boxes, an image representation that can handle this misalignment is required. In this paper, we propose a network that learns a part-aligned representation for person re-identification. Our model consists of a two-stream network, which generates appearance and body part feature maps respectively, and a bilinear-pooling layer that fuses two feature maps to an image descriptor. We show that it results in a compact descriptor, where the image matching similarity is equivalent to an aggregation of the local appearance similarities of the corresponding body parts. Since the image similarity does not depend on the relative positions of parts, our approach significantly reduces the part misalignment problem. Training the network does not require any part annotation on the person re-identification dataset. Instead, we simply initialize the part sub-stream using a pre-trained sub-network of an existing pose estimation network and train the whole network to minimize the re-identification loss. We validate the effectiveness of our approach by demonstrating its superiority over the state-of-the-art methods on the standard benchmark datasets including Market-1501, CUHK03, CUHK01 and DukeMTMC, and standard video dataset MARS.

Related Material


[pdf] [arXiv]
[bibtex]
@InProceedings{Suh_2018_ECCV,
author = {Suh, Yumin and Wang, Jingdong and Tang, Siyu and Mei, Tao and Lee, Kyoung Mu},
title = {Part-Aligned Bilinear Representations for Person Re-Identification},
booktitle = {Proceedings of the European Conference on Computer Vision (ECCV)},
month = {September},
year = {2018}
}