Objective: The aim of this study is to determine the interrater agreement in a clinical practice environment for the most commonly used magnetic resonance enterography (MRE) features of Crohn's disease (CD).
Methods: CD patients with MRE's before and after treatment were retrospectively identified using search queries over a 7-year period (May 2017-September 2017). MRE features of CD comprising components of multiple CD scoring indices were scored by radiologists in the same segment of bowel. Agreement for nominal categorical and continuous variables was assessed using a κ and interclass correlation coefficients, respectively.
Results: 80 scans comprised the study population. Moderate interrater agreement was seen in both the pre- and post-treatment MRE's for presence of diffusion restriction (к = 0.43, 0.48; pre- and post-treatment), stricturing disease (к = 0.51, 0.52), overall degree of severity (к = 0.49, 0.59). Substantial agreement was seen in pre- and post-treatment scans for length of involvement (interclass correlation coefficient = 0.67, 0.61). The presence of mucosal ulceration had no agreement (к = -0.07, -0.042).
Conclusion: Many MRE features of active CD comprising the major CD scoring indices are reproducible when interpreted by non-CD focused abdominal radiologists. However, the presence of mucosal ulcerations had no agreement and may need more investigation before including this feature as a driver in therapeutic decision making.
Advances in knowledge: Demonstrates the unreliability of mucosal ulceration by non-CD focused abdominal radiologists, targeting a potential area for future education. Key Points The majority of MRE findings incorporated in to many CD scoring indices have fair to moderate inter-rater agreement even when read by non-MRE expert radiologists. Substantial agreement was seen in the length of involved bowel, but this feature is only incorporated in to one of the CD scoring indices. Presence of mucosal ulcerations had no interrater agreement in our study-a feature which is heavily weighted by several CD scoring indices. Research should be focused bridging those features which have poor interrater agreement.