Managing false positives during detection of pathogen sequences in shotgun metagenomics datasets

BMC Bioinformatics. 2024 Dec 3;25(1):372. doi: 10.1186/s12859-024-05952-x.

Abstract

Background: Culture-independent diagnostic tests are gaining popularity as tools for detecting pathogens in food. Shotgun sequencing holds substantial promise for food testing as it provides abundant information on microbial communities, but the challenge is in analyzing large and complex sequencing datasets with a high degree of both sensitivity and specificity. Falsely classifying sequencing reads as originating from pathogens can lead to unnecessary food recalls or production shutdowns, while low sensitivity resulting in false negatives could lead to preventable illness.

Results: We used simulated and published shotgun sequencing datasets containing Salmonella-derived reads to explore the appearance and mitigation of false positive results using the popular taxonomic annotation softwares Kraken2 and Metaphlan4. Using default parameters, Kraken2 is sensitive but prone to false positives, while Metaphlan4 is more specific but unable to detect Salmonella at low abundance. We then developed a bioinformatic pipeline for identifying and removing reads falsely identified as Salmonella by Kraken2 while retaining high sensitivity. Carefully considering software parameters and database choices is essential to avoiding false positive sample calls. With well-chosen parameters plus additional steps to confirm the taxonomic origin of reads, it is possible to detect pathogens with very high specificity and sensitivity.

Keywords: Metagenomics; Pathogen detection; Salmonella; Shotgun sequencing.

MeSH terms

  • Computational Biology / methods
  • False Positive Reactions
  • Food Microbiology / methods
  • Metagenomics* / methods
  • Salmonella* / genetics
  • Sensitivity and Specificity
  • Sequence Analysis, DNA / methods
  • Software*