- [pdf] [supp]
ScanpathNet: A Recurrent Mixture Density Network for Scanpath Prediction
Understanding the mechanisms underlying human visual attention is an important research problem in cognitive neuroscience and computer vision. While existing models predict salient regions (i.e., saliency maps) and temporal sequences of eye fixations (i.e., scanpaths) in images, their designs often partially follow theoretical frameworks. Here, we introduce ScanpathNet, a deep learning model inspired by the latest theoretical model in neuroscience. It is 'guided' by a dynamic priority map influenced by semantic content and fixation history. The model leverages convolutional neural networks to extract rich semantic features, convolutional long short-term memory networks to model the inhibition of return mechanism and sequential dependencies of fixations, and mixture density networks to predict probability distributions of fixations for each pixel. Simulated human scanpaths can then be generated by sequentially sampling the output of the proposed model. Despite its simplicity, ScanpathNet showed promising qualitative and quantitative scanpath prediction performance in extensive experiments on numerous eye-tracking benchmark datasets.