Purpose: To evaluate both qualitative and quantitative scoring methods for the cosmetic result after breast-conserving therapy (BCT), and to compare the usefulness and reliability of these methods.
Methods and materials: In EORTC trial 22881/10882, stage I and II breast cancer patients were treated with tumorectomy and axillary dissection. A total of 5318 patients were randomized between no boost and a boost of 16 Gy following whole-breast irradiation of 50 Gy. The cosmetic result was assessed for 731 patients in two ways. A panel scored the qualitative appearance of the breast using photographs taken after surgery and 3 years later. Digitizer measurements of the displacement of the nipple were also made using these photographs in order to calculate the breast retraction assessment (BRA). The cosmetic results after 3-year follow-up were used to analyze the correlation between the panel evaluation and digitizer measurements.
Results: For the panel evaluation the intraobserver agreement for the global cosmetic score as measured by the simple Kappa statistic was 0.42, considered moderate agreement. The multiple Kappa statistic for interobserver agreement for the global cosmetic score was 0.28, considered fair agreement. The specific cosmetic items scored by the panel were all significantly related to the global cosmetic score; breast size and shape influenced the global score most. For the digitizer measurements, the standard deviation from the average value of 30.0 mm was 2.3 mm (7.7%) for the intraobserver variability and 2.6 mm (8.7%) for the interobserver variability. The two methods were significantly, though moderately, correlated; some items scored by the panel were only correlated to the digitizer measurements if the tumor was not located in the inferior quadrant of the breast.
Conclusions: The intra- and interobserver variability of the digitizer evaluation of cosmesis was smaller than that of the panel evaluation. However, there are some treatment sequelae, such as disturbing scars and skin changes, that can not be evaluated by BRA measurements. Therefore, the methods of cosmetic evaluation used in a study must be chosen in a way that balances reliability and comprehensiveness.