An End-to-End Vision Transformer Approach for Image Copy Detection

Lee, Jiahe Steven; Hsu, Wynne; Lee, Mong Li

Jiahe Steven Lee, Wynne Hsu, Mong Li Lee; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 6997-7006

Abstract

Image copy detection is one of the pivotal tools to safeguard online information integrity. The challenge lies in determining whether a query image is an edited copy which necessitates the identification of candidate source images through a retrieval process. The process requires discriminative features comprising of both global descriptors that are designed to be augmentation-invariant and local descriptors that can capture salient foreground objects to assess whether a query image is an edited copy of some source reference image. This work describes an end-to-end solution that leverage a Vision Transformer model to learn such discriminative features and perform implicit matching between the query image and the reference image.

Related Material

[pdf]

[bibtex]

@InProceedings{Lee_2024_CVPR, author = {Lee, Jiahe Steven and Hsu, Wynne and Lee, Mong Li}, title = {An End-to-End Vision Transformer Approach for Image Copy Detection}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {6997-7006} }