TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks

Lee, Ho-Weng; Lai, Shang-Hong

Ho-Weng Lee, Shang-Hong Lai; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2024, pp. 3921-3929

Abstract

In recent years the focus on anomaly detection and localization in industrial inspection tasks has intensified. While existing studies have demonstrated impressive outcomes they often rely heavily on extensive training datasets or robust features extracted from pre-trained models trained on diverse datasets like ImageNet. In this work we propose a novel framework leveraging the visual-linguistic CLIP model to adeptly train a backbone model tailored to the manufacturing domain. Our approach concurrently considers visual and text-aligned embedding spaces for normal and abnormal conditions. The resulting pre-trained backbone markedly enhances performance in industrial downstream tasks particularly in anomaly detection and localization. Notably this improvement is substantiated through experiments conducted on multiple datasets such as MVTecAD BTAD and KSDD2. Furthermore using our pre-trained backbone weights allows previous works to achieve superior performance in few-shot scenarios with less training data. The proposed anomaly backbone provides a foundation model for more precise anomaly detection and localization.

Related Material

[pdf] [supp] [arXiv]

[bibtex]

@InProceedings{Lee_2024_CVPR, author = {Lee, Ho-Weng and Lai, Shang-Hong}, title = {TAB: Text-Align Anomaly Backbone Model for Industrial Inspection Tasks}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2024}, pages = {3921-3929} }