The similarity in meaning assigned to response choice labels from the SF-36 Health Survey (SF-36) was evaluated across countries. Convenience samples of judges (range, 10 to 117; median = 48) from 13 countries rated translations of response choice labels, using a variation of the Thurstone method of equal appearing intervals. Judges marked a point on a 10-cm line-representing the magnitude of a response choice label (e.g., "good" relative to the anchors of "poor" and "excellent"). Ratings were evaluated to determine the ordinal consistency of response choice labels within a response scale; the degree to which differences between adjacent response choice labels were equal interval; and the amount of variance due to response choice label, country, judge, and interaction between response choice label and country. Results confirmed the hypothesized ordering of response choice labels; the percentage of ordinal pairs ranged from 88.7% to 100% (median = 98.2%) across countries and response scales. Examination of the average magnitudes of response choice labels supported the "quasi-interval" nature of the scales. Analysis of variance (ANOVA) results supported the generalizability of response choice magnitudes across countries; labels explained 64% to 77% of the variance in ratings, and country explained 1% to 3%. These results support the equivalence of SF-36 response choice labels across countries. Departures from the assumption of equal intervals, when observed, were similar across countries and were greatest for the two response scales that are recalibrated under standard SF-36 scoring. Results provide justification for scoring translations of individual items using standard SF-36 scoring; whether these items form the same scales in other countries as they do in the United States is evaluated with tests of scaling assumptions.