- [pdf] [supp] [arXiv]
Perception Over Time: Temporal Dynamics for Robust Image Understanding
While deep learning surpasses human-level performance in specific vision tasks, it is fragile and overconfident in its classification. For example, minor transformations in perspective, illumination, or object deformation in the image space can result in drastically different labeling. This is especially apparent when adversarial perturbations are present. Conversely, human visual perception is orders of magnitude more robust to input stimulus changes. Neuroscience research suggests that biological perception is a dynamic process that converges over time, even for static images and scenes. Almost all perception frameworks lack this convergence property, which makes them vulnerable to minor perturbations. Motivated by our human task results, we introduce a novel framework for incorporating temporal dynamics into static image understanding. We demonstrate a biologically plausible model that decomposes a single image into a series of coarse-to-fine images, mimicking the integration of visual information in the human brain. Our model utilizes this information "over time", resulting in significant improvements in its accuracy, robustness, and cost-effectiveness over standard CNNs. We explicitly quantify the adversarial robustness properties of our coarse-to-fine framework through multiple studies. Our quantitative and qualitative results convincingly demonstrate exciting and transformative improvements over standard architectures.