Raising awareness of uncertain choices in empirical data analysis: A teaching concept toward replicable research practices

PLoS Comput Biol. 2024 Mar 28;20(3):e1011936. doi: 10.1371/journal.pcbi.1011936. eCollection 2024 Mar.

Abstract

Throughout their education and when reading the scientific literature, students may get the impression that there is a unique and correct analysis strategy for every data analysis task and that this analysis strategy will always yield a significant and noteworthy result. This expectation conflicts with a growing realization that there is a multiplicity of possible analysis strategies in empirical research, which will lead to overoptimism and nonreplicable research findings if it is combined with result-dependent selective reporting. Here, we argue that students are often ill-equipped for real-world data analysis tasks and unprepared for the dangers of selectively reporting the most promising results. We present a seminar course intended for advanced undergraduates and beginning graduate students of data analysis fields such as statistics, data science, or bioinformatics that aims to increase the awareness of uncertain choices in the analysis of empirical data and present ways to deal with these choices through theoretical modules and practical hands-on sessions.

MeSH terms

  • Curriculum
  • Humans
  • Students*
  • Teaching*

Grants and funding

The authors gratefully acknowledge the funding by DFG grants BO3139/7-1 and BO3139/9-1 to A-LB. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.