Robust single-cell Hi-C clustering by convolution- and random-walk-based imputation

Proc Natl Acad Sci U S A. 2019 Jul 9;116(28):14011-14018. doi: 10.1073/pnas.1901423116. Epub 2019 Jun 24.

Abstract

Three-dimensional genome structure plays a pivotal role in gene regulation and cellular function. Single-cell analysis of genome architecture has been achieved using imaging and chromatin conformation capture methods such as Hi-C. To study variation in chromosome structure between different cell types, computational approaches are needed that can utilize sparse and heterogeneous single-cell Hi-C data. However, few methods exist that are able to accurately and efficiently cluster such data into constituent cell types. Here, we describe scHiCluster, a single-cell clustering algorithm for Hi-C contact matrices that is based on imputations using linear convolution and random walk. Using both simulated and real single-cell Hi-C data as benchmarks, scHiCluster significantly improves clustering accuracy when applied to low coverage datasets compared with existing methods. After imputation by scHiCluster, topologically associating domain (TAD)-like structures (TLSs) can be identified within single cells, and their consensus boundaries were enriched at the TAD boundaries observed in bulk cell Hi-C samples. In summary, scHiCluster facilitates visualization and comparison of single-cell 3D genomes.

Keywords: 3D chromosome structure; Hi-C; random walk; single cell.

Publication types

  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Algorithms
  • Chromatin / ultrastructure*
  • Chromosome Structures / ultrastructure*
  • Cluster Analysis
  • Computational Biology*
  • Genome / genetics
  • Humans
  • Molecular Conformation
  • Single-Cell Analysis*

Substances

  • Chromatin