Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses

Clin Rehabil. 1998 Jun;12(3):187-99. doi: 10.1191/026921598672178340.

Abstract

Objective: To provide a practical guide to appropriate statistical analysis of a reliability study using real-time ultrasound for measuring muscle size as an example.

Design: Inter-rater and intra-rater (between-scans and between-days) reliability.

Subjects: Ten normal subjects (five male) aged 22-58 years.

Method: The cross-sectional area (CSA) of the anterior tibial muscle group was measured using real-time ultrasonography.

Main outcome measures: Intraclass correlation coefficients (ICCs) and the 95% confidence interval (CI) for the ICCs, and Bland and Altman method for assessing agreement, which includes calculation of the mean difference between measures (d), the 95% CI for d, the standard deviation of the differences (SDdiff), the 95% limits of agreement and a reliability coefficient.

Results: Inter-rater reliability was high, ICC (3,1) was 0.92 with a 95% CI of 0.72 --> 0.98. There was reasonable agreement between measures on the Bland and Altman test, as d was -0.63 cm2, the 95% CI for d was -1.4 --> 0.14 cm2, the SDdiff was 1.08 cm2, the 95% limits of agreement -2.73 --> 1.53 cm2 and the reliability coefficient was 2.4. Between-scans repeatability was high, ICCs (1,1) were 0.94 and 0.93 with 95% CIs of 0.8 --> 0.99 and 0.75 --> 0.98, for days 1 and 2 respectively. Measures showed good agreement on the Bland and Altman test: d for day 1 was 0.15 cm2 and for day 2 it was -0.32 cm2, the 95% CIs for d were -0.51 --> 0.81 cm2 for day 1 and -0.98 --> 0.34 cm2 for day 2; SDdiff was 0.93 cm2 for both days, the 95% imits of agreement were -1.71 --> 2.01 cm2 for day 1 and -2.18 --> 1.54 cm2 for day 2; the reliability coefficient was 1.80 for day 1 and 1.88 for day 2. The between-days ICC (1,2) was 0.92 and the 95% CI 0.69 --> 0.98. The d was -0.98 cm2, the SDdiff was 1.25 cm2 with 95% limits of agreement of -3.48 --> 1.52 cm2 and the reliability coefficient 2.8. The 95% CI for d (-1.88 --> -0.08 cm2) and the distribution graph showed a bias towards a larger measurement on day 2.

Conclusions: The ICC and Bland and Altman tests are appropriate for analysis of reliability studies of similar design to that described, but neither test alone provides sufficient information and it is recommended that both are used.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adult
  • Female
  • Humans
  • Male
  • Middle Aged
  • Muscle, Skeletal / diagnostic imaging*
  • Observer Variation
  • Outcome Assessment, Health Care / statistics & numerical data*
  • Rehabilitation* / statistics & numerical data
  • Reproducibility of Results
  • Ultrasonography