Shape-Based Measures Improve Scene Categorization

Morteza Rezanejad; John Wilder; Dirk B Walther; Allan D Jepson; Sven Dickinson; Kaleem Siddiqi

doi:10.1109/TPAMI.2023.3333352

Shape-Based Measures Improve Scene Categorization

IEEE Trans Pattern Anal Mach Intell. 2024 Apr;46(4):2041-2053. doi: 10.1109/TPAMI.2023.3333352. Epub 2024 Mar 6.

Authors

Morteza Rezanejad, John Wilder, Dirk B Walther, Allan D Jepson, Sven Dickinson, Kaleem Siddiqi

PMID: 38039177
DOI: 10.1109/TPAMI.2023.3333352

Abstract

Converging evidence indicates that deep neural network models that are trained on large datasets are biased toward color and texture information. Humans, on the other hand, can easily recognize objects and scenes from images as well as from bounding contours. Mid-level vision is characterized by the recombination and organization of simple primary features into more complex ones by a set of so-called Gestalt grouping rules. While described qualitatively in the human literature, a computational implementation of these perceptual grouping rules is so far missing. In this article, we contribute a novel set of algorithms for the detection of contour-based cues in complex scenes. We use the medial axis transform (MAT) to locally score contours according to these grouping rules. We demonstrate the benefit of these cues for scene categorization in two ways: (i) Both human observers and CNN models categorize scenes most accurately when perceptual grouping information is emphasized. (ii) Weighting the contours with these measures boosts performance of a CNN model significantly compared to the use of unweighted contours. Our work suggests that, even though these measures are computed directly from contours in the image, current CNN models do not appear to extract or utilize these grouping cues.