Auditory cortex encodes information about nonlinear combinations of spectro-temporal sound features. Convolutional neural networks (CNNs) provide an architecture for generalizable encoding models that can predict time-varying neural activity evoked by natural sounds with substantially greater accuracy than established models. However, the complexity of CNNs makes it difficult to discern the computational properties that support their improved performance. To address this limitation, we developed a method to visualize the tuning subspace captured by a CNN. Single-unit data was recorded using high channel-count microelectrode arrays from primary auditory cortex (A1) of awake, passively listening ferrets during presentation of a large natural sound set. A CNN was fit to the data, replicating approaches from previous work. To measure the tuning subspace, the dynamic spectrotemporal receptive field (dSTRF) was measured as the locally linear filter approximating the input-output relationship of the CNN at each stimulus timepoint. Principal component analysis was then used to reduce this very large set of filters to a smaller subspace, typically requiring 2-10 filters to account for 90% of dSTRF variance. The stimulus was projected into the subspace for each neuron, and a new model was fit using only the projected values. The subspace model was able to predict time-varying spike rate nearly as accurately as the full CNN. Sensory responses could be plotted in the subspace, providing a compact model visualization. This analysis revealed a diversity of nonlinear responses, consistent with contrast gain control and emergent invariance to spectrotemporal modulation phase. Within local populations, neurons formed a sparse representation by tiling the tuning subspace. Narrow spiking, putative inhibitory neurons showed distinct patterns of tuning that may reflect their position in the cortical circuit. These results demonstrate a conceptual link between CNN and subspace models and establish a framework for interpretation of deep learning-based models.
Significance statement: Auditory cortex mediates the representation and discrimination of complex sound features. Many models have been proposed for cortical sound encoding, varying in their generality, interpretability, and ease of fitting. It has been difficult to determine if/what different functional properties are captured by different models. This study shows that two families of encoding models, convolutional neural networks (CNNs) and tuning subspace models account for the same functional properties, providing an important analytical link between accurate models that are easy to fit (CNNs) and models that are straightforward to interpret (tuning subspace).