ScanpathNet: A Recurrent Mixture Density Network for Scanpath Prediction

Ryan Anthony Jalova de Belen, Tomasz Bednarz, Arcot Sowmya; Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2022, pp. 5010-5020

Abstract


Understanding the mechanisms underlying human visual attention is an important research problem in cognitive neuroscience and computer vision. While existing models predict salient regions (i.e., saliency maps) and temporal sequences of eye fixations (i.e., scanpaths) in images, their designs often partially follow theoretical frameworks. Here, we introduce ScanpathNet, a deep learning model inspired by the latest theoretical model in neuroscience. It is 'guided' by a dynamic priority map influenced by semantic content and fixation history. The model leverages convolutional neural networks to extract rich semantic features, convolutional long short-term memory networks to model the inhibition of return mechanism and sequential dependencies of fixations, and mixture density networks to predict probability distributions of fixations for each pixel. Simulated human scanpaths can then be generated by sequentially sampling the output of the proposed model. Despite its simplicity, ScanpathNet showed promising qualitative and quantitative scanpath prediction performance in extensive experiments on numerous eye-tracking benchmark datasets.

Related Material


[pdf] [supp]
[bibtex]
@InProceedings{de_Belen_2022_CVPR, author = {de Belen, Ryan Anthony Jalova and Bednarz, Tomasz and Sowmya, Arcot}, title = {ScanpathNet: A Recurrent Mixture Density Network for Scanpath Prediction}, booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops}, month = {June}, year = {2022}, pages = {5010-5020} }