Abstract
A crucial component of language acquisition involves
organizing words into grammatical categories and discovering
relations between them. Many studies have argued that
phonological or semantic cues or multiple correlated cues are
required for learning. Here we examine how distributional
variables will shift learners from forming a category of lexical
items to maintaining lexical specificity. In a series of
artificial language learning experiments, we vary a number of
distributional variables to category structure and test how
adult learners use this information to inform their hypotheses
about categorization. Our results show that learners are
sensitive to the contexts in which each word occurs, the
overlap in contexts across words, the non-overlap of contexts
(or systematic gaps), and the size of the data set. These
variables taken together determine whether learners fully
generalize or preserve lexical specificity.
Introduction
Language acquisition crucially involves finding the
grammatical categories of words in the input. The
organization of elements into categories, and the
generalization of patterns from some seen element
combinations to novel ones, account for important aspects
of the expansion of linguistic knowledge in early stages of
language acquisition. One hypothesis of how learners
approach the problem of categorization is that the categories
(but not their contents) are innately specified prior to
experiencing any linguistic input, with the assignment of
tokens to categories accomplished with minimal exposure.
A second possibility is that the categories are formed around
a semantic definition. A third hypothesis, explored in the
present research, is that the distributional information in the
environment is sufficient (along with a set of learning
biases) to extract the categorical structure of natural
language. While it is likely that each of these sources of
evidence makes important contributions to language
acquisition, this third hypothesis regarding distributional
learning has often been thought to be an unlikely
contributor, given the information processing limitations of
young children and the complexity of the computational
processes that would be entailed.
Furthermore, it has been difficult to test the importance of
such a distributional learning mechanism because the cues
to category structure in natural languages are highly
correlated. In fact, it has been argued in many artificial
language studies that the formation of linguistic categories
(e.g., noun, verb) depends crucially on some perceptual
property linking items within the category (Braine, 1987).
This perceptual similarity relation might arise from identity
or repetition of elements in grammatical sequences, or a
phonological or semantic cue identifying words across
different sentences as similar to one another (for example,
words ending in –a are feminine, or words referring to
concrete objects are nouns). Learners of artificial languages
have been unable to acquire grammatical categories and to
extend their linguistic contexts to new items correctly
without such cues (Braine et al., 1990; Frigo & McDonald,
1998; Gomez & Gerken, 2000). However, this has been
somewhat of a puzzle: Maratsos & Chalkley (1980) argued
that in natural languages, grammatical categories do not
have reliable phonological or semantic cues; rather, learners
must utilize distributional cues about the linguistic contexts
in which words occur to acquire such categories. Mintz,
Newport & Bever (2002), as well as several other
researchers, have shown that computational procedures
utilizing distributional contexts can form elementary
linguistic categories on corpora of mothers’ speech to young
children from the CHILDES database, and Mintz (2002) and
Gerken et al. (2005) have shown that both adults and infants
can learn a simple version of this paradigm in the
laboratory, at least when there are multiple correlated
distributional cues. In the present series of experiments we
also begin by demonstrating that there are distributional
properties that lead to successful learning of linguistic
categories in artificial language paradigms. Importantly,
however, in order to understand how this mechanism works
in human learners and why many previous experiments have
not found such learning, we present a series of experiments
that manipulate various aspects of these distributional
variables, in order to understand the computational
requirements for successful category learning.