Crowding reveals fundamental differences in local vs. global processing in humans and machines

Vision Res. 2020 Feb:167:39-45. doi: 10.1016/j.visres.2019.12.006. Epub 2020 Jan 7.

Abstract

Feedforward Convolutional Neural Networks (ffCNNs) have become state-of-the-art models both in computer vision and neuroscience. However, human-like performance of ffCNNs does not necessarily imply human-like computations. Previous studies have suggested that current ffCNNs do not make use of global shape information. However, it is currently unclear whether this reflects fundamental differences between ffCNN and human processing or is merely an artefact of how ffCNNs are trained. Here, we use visual crowding as a well-controlled, specific probe to test global shape computations. Our results provide evidence that ffCNNs cannot produce human-like global shape computations for principled architectural reasons. We lay out approaches that may address shortcomings of ffCNNs to provide better models of the human visual system.

Keywords: Convolutional Neural Networks; Crowding; Deep Neural Networks; Global processing; Grouping; Segmentation.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Computer Simulation
  • Crowding*
  • Form Perception / physiology*
  • Humans
  • Image Processing, Computer-Assisted*
  • Neural Networks, Computer*
  • Vision, Ocular / physiology*