In 2018, De Los Reyes and Langer expanded the scope of the Evidence Base Updates series to include reviews of psychological assessment techniques. In keeping with the goal of offering clear "take-home messages" about the evidence underlying the technique, experts have proposed a rubric for evaluating the reliability and validity support. Changes in the research environment and pressures in the peer review process, as well as a lack of familiarity with some statistical methods, have created a situation in which many findings that appear "excellent" in the rubric are likely to be "too good to be true," in the sense that they are unlikely to generalize to clinical settings or are unlikely to be reproduced in independent samples. We describe several common scenarios in which published results are often too good to be true, including internal consistency, interrater reliability, correlation, standardized mean differences, diagnostic accuracy, and global model fit statistics. Simple practices could go a long way toward improving design, reporting, and interpretation of findings. When effect sizes are in the "excellent" range for issues that have been challenging, scrutinize before celebrating. When benchmarks are available based on theory or meta-analyses, results that are moderately better than expected in the favorable direction (i.e., Cohen's q ≥ +.30) also invite critical appraisal and replication before application. If readers and reviewers pull for transparency and do not unduly penalize authors who provide it, then change in research quality will be faster and both generalizability and reproducibility are likely to benefit.