Background: Misclassification of adverse events in clinical trials can sometimes have serious consequences. Therefore, each of the many steps involved, from a patient's adverse experience to presentation in tables in publications, should be as standardised as possible, minimising the scope for interpretation. Adverse events are categorised by a predefined dictionary, e.g. MedDRA, which is updated biannually with many new categories. The objective of this paper is to study interobserver variation and other challenges of coding.
Methods: Systematic review using PRISMA. We searched PubMed, EMBASE and The Cochrane Library. All studies were screened for eligibility by two authors.
Results: Our search returned 520 unique studies of which 12 were included. Only one study investigated interobserver variation. It reported that 12% of the codes were evaluated differently by two coders. Independent physicians found that 8% of all the codes deviated from the original description. Other studies found that product summaries could be greatly affected by the choice of dictionary. With the introduction of MedDRA, it seems to have become harder to identify adverse events statistically because each code is divided in subgroups. To account for this, lumping techniques have been developed but are rarely used, and guidance on when to use them is vague. An additional challenge is that adverse events are censored if they already occurred in the run-in period of a trial. As there are more than 26 ways of determining whether an event has already occurred, this can lead to bias, particularly because data analysis is rarely performed blindly.
Conclusion: There is a lack of evidence that coding of adverse events is a reliable, unbiased and reproducible process. The increase in categories has made detecting adverse events harder, potentially compromising safety. It is crucial that readers of medical publications are aware of these challenges. Comprehensive interobserver studies are needed.