Off Script: Idiosyncratic Words in Childhood Language Environments

Dublin Core

Title

Off Script: Idiosyncratic Words in Childhood Language Environments

Creator

Alexander Stern, 3rd-Year, Urban Studies

Date

2023 URS

Contributor

Dr. Marisa Casillas, Comparative Human Development, Chatter Lab, University of Chicago; Kennedy Casey, Chatter Lab, University of Chicago

Text Item Type Metadata

Text

Which words are the most common for children to hear? Which are the rarest? Word frequencies in natural language are Zipfian-distributed, meaning some words are used very frequently, while the vast majority of words are used infrequently and are therefore considered lexically rare. Higher lexical rarity is associated with more adult-like language because lower-frequency words are generally learned later. Consequently, lexical rarity measures can be used to track the maturation of children’s speech. Lexical rarity is often calculated using the relative frequency of different words in naturalistic recordings of child-produced and child-directed speech. A widely-used collection of transcribed audio recordings is the Child Language Data Exchange System (CHILDES) database. Its English subset includes 980 children who heard nearly 3.5 million instances of 24,000 unique words. Lexical rarity is simple to calculate with existing transcripts, yet researchers run into methodological confounds when estimating lexical rarity from these raw data. For instance, not every transcribed word in CHILDES is an actual English word, which could skew measures of lexical rarity. Additionally, some words, such as names, are highly idiosyncratic—frequently uttered within certain children’s language environments but broadly uncommon when counted across all families. By determining lexical rarity through relative frequency as many researchers do, these nuances are overlooked. After manually checking all unique transcribed words in CHILDES, we identified an error rate of 36.1%, meaning that more than 1 in 3 words were not actual English words. Lower-frequency transcribed words were more likely to be non-words. Additionally, an exploratory analysis of some of the rarest words revealed that the most common word type within this set was proper names. Thus, we find evidence for a potentially-high incidence of idiosyncrasies in child language environments. This discovery emphasizes the importance of studying children’s individual environments, not only interpreting experiences in aggregate.

Original Format

Digital Abstract

Files

AlexanderStern_Poster_URS2023.pdf

Citation

Alexander Stern, 3rd-Year, Urban Studies, “Off Script: Idiosyncratic Words in Childhood Language Environments,” 2023 University of Chicago Undergraduate Research Symposium, accessed May 3, 2024, https://ugradresearchsymposium.omeka.net/items/show/103.