Accurate neuropsychological assessment of older individuals from heterogeneous backgrounds is a major challenge. Education, ethnicity, language, and age are associated with scale level differences in test scores, but item level bias might contribute to these differences. We evaluated several strategies for dealing with item and scale level demographic influences on a measure of executive abilities defined by working memory and fluency tasks. We determined the impact of differential item functioning (DIF). We compared composite scoring strategies on the basis of their relationships with volumetric magnetic resonance imaging (MRI) measures of brain structure. Participants were 791 Hispanic, white, and African American older adults. DIF had a salient impact on test scores for 9% of the sample. MRI data were available on a subset of 153 participants. Validity in comparison with structural MRI was higher after scale level adjustment for education, ethnicity/language, and gender, but item level adjustment did not have a major impact on validity. Age adjustment at the scale level had a negative impact on relationships with MRI, most likely because age adjustment removes variance related to age-associated diseases.