SpatialCTD: A Large-Scale Tumor Microenvironment Spatial Transcriptomic Dataset to Evaluate Cell Type Deconvolution for Immuno-Oncology

J Comput Biol. 2024 Sep;31(9):871-885. doi: 10.1089/cmb.2024.0532. Epub 2024 Aug 8.

Abstract

Recent technological advancements have enabled spatially resolved transcriptomic profiling but at a multicellular resolution that is more cost-effective. The task of cell type deconvolution has been introduced to disentangle discrete cell types from such multicellular spots. However, existing benchmark datasets for cell type deconvolution are either generated from simulation or limited in scale, predominantly encompassing data on mice and are not designed for human immuno-oncology. To overcome these limitations and promote comprehensive investigation of cell type deconvolution for human immuno-oncology, we introduce a large-scale spatial transcriptomic deconvolution benchmark dataset named SpatialCTD, encompassing 1.8 million cells and 12,900 pseudo spots from the human tumor microenvironment across the lung, kidney, and liver. In addition, SpatialCTD provides more realistic reference than those generated from single-cell RNA sequencing (scRNA-seq) data for most reference-based deconvolution methods. To utilize the location-aware SpatialCTD reference, we propose a graph neural network-based deconvolution method (i.e., GNNDeconvolver). Extensive experiments show that GNNDeconvolver often outperforms existing state-of-the-art methods by a substantial margin, without requiring scRNA-seq data. To enable comprehensive evaluations of spatial transcriptomics data from flexible protocols, we provide an online tool capable of converting spatial transcriptomic data from various platforms (e.g., 10× Visium, MERFISH, and sci-Space) into pseudo spots, featuring adjustable spot size. The SpatialCTD dataset and GNNDeconvolver implementation are available at https://github.com/OmicsML/SpatialCTD, and the online converter tool can be accessed at https://omicsml.github.io/SpatialCTD/.

Keywords: benchmark dataset; cell type deconvolution; graph neural networks; human immuno-oncology; spatial transcriptomic data; tumor microenvironment.

MeSH terms

  • Algorithms
  • Animals
  • Computational Biology / methods
  • Gene Expression Profiling / methods
  • Humans
  • Mice
  • Neoplasms / genetics
  • Neoplasms / pathology
  • Neural Networks, Computer
  • Single-Cell Analysis / methods
  • Software
  • Transcriptome* / genetics
  • Tumor Microenvironment* / genetics