Recent studies have revealed an unexplored population of long cell-free DNA (cfDNA) molecules in human plasma using long-read sequencing technologies. However, the biological properties of long cfDNA molecules (>500 bp) remain largely unknown. To this end, we have investigated the origins of long cfDNA molecules from different genomic elements. Analysis of plasma cfDNA using long-read sequencing reveals an uneven distribution of long molecules from across the genome. Long cfDNA molecules show overrepresentation in euchromatic regions of the genome, in sharp contrast to short DNA molecules. We observe a stronger relationship between the abundance of long molecules and mRNA gene expression levels, compared with short molecules (Pearson's r = 0.71 vs. -0.14). Moreover, long and short molecules show distinct fragmentation patterns surrounding CpG sites. Leveraging the cleavage preferences surrounding CpG sites, the combined cleavage ratios of long and short molecules can differentiate patients with hepatocellular carcinoma (HCC) from non-HCC subjects (AUC = 0.87). We also investigated knockout mice in which selected nuclease genes had been inactivated in comparison with wild-type mice. The proportion of long molecules originating from transcription start sites are lower in Dffb-deficient mice but higher in Dnase1l3-deficient mice compared with that of wild-type mice. This work thus provides new insights into the biological properties and potential clinical applications of long cfDNA molecules.
© 2024 Che et al.; Published by Cold Spring Harbor Laboratory Press.