Corpus of Mandarin Child Language: a preliminary study on the acquisition of semantic content categories in Mandarin-speaking preschoolers

Front Psychol. 2023 Nov 10:14:1234525. doi: 10.3389/fpsyg.2023.1234525. eCollection 2023.

Abstract

In studying language acquisition in children, sizable research studies have been focusing on the investigation of form and lexical semantics. This study aims to establish a child language database annotated both syntactically with part of speech and semantically with semantic content category to supplement the study of child language acquisition in the semantic domain beyond lexical level. The Corpus of Mandarin Child Language (CMCL) that documented the production of different semantic content categories by Mandarin-speaking children was established. Naturalistic language samples of 82 native Mandarin-speaking children aged 25-60 months, divided into three age groups, were obtained. The corresponding semantic content categories coded in each utterance were tagged according to previous studies, in addition to the annotations of part of speech. MLU and lexical diversity were examined and the usage and acquisition of different semantic content categories were also analyzed. The results regarding syntactic complexity and lexical diversity replicated the typical language acquisition pattern from previous studies, which supported the validity of the data obtained in the CMCL. To investigate the trajectory of acquisition of various semantic content categories by age, a 90% acquisition criterion was used. Our findings regarding the acquisition order of semantic content category were basically in line with previous studies in general, with some minor differences. This acquisition order observed is largely explained by the cognitive and syntactic complexity associated with the semantic content category, with additional influence from language specific properties and cultural specific factors of Mandarin. In addition, with the tags in both part-of-speech and semantic content category, the CMCL potentially provides a platform for examining the form-content interface in early child language acquisition, which also implies significantly on the theoretical and clinical ground.

Keywords: Mandarin-speaking children; acquisition; cognitive and syntactic complexity; language corpus; semantic content category.