Learning Interpretable Forensic Representations via Local Window Modulation

Sowmen Das, Md. Ruhul Amin; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 436-447

Abstract


The majority of existing image forgeries involve augmenting a specific region of the source image which leaves detectable artifacts and forensic traces. These distinguishing features are mostly found in and around the local neighborhood of the manipulated pixels. However, patch-based detection approaches quickly become intractable due to inefficient computation and low robustness. In this work, we investigate how to effectively learn these forensic representations using local window-based attention techniques. We propose Forensic Modulation Network (ForMoNet) that uses focal modulation and gated attention layers to automatically identify the long and short-range context for any query pixel. Furthermore, the network is more interpretable and computationally efficient than standard self-attention, which is critical for real-world applications. Our evaluation of various benchmarks shows that ForMoNet outperforms existing transformer-based forensic networks by 6% to 11% on different forgeries.

Related Material


[pdf]
[bibtex]
@InProceedings{Das_2023_ICCV, author = {Das, Sowmen and Amin, Md. Ruhul}, title = {Learning Interpretable Forensic Representations via Local Window Modulation}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {436-447} }