Trace: • research_files • aslin_newport
Aslin newport
Statistical Learning: From Acquiring Specific Items to Forming General Rules
Abstract
Statistical learning is a rapid and robust mechanism that enables adults and infants to extract patterns embedded in both
language and visual domains. Statistical learning operates implicitly, without instruction, through mere exposure to a set of
input stimuli. However, much of what learners must acquire about a structured domain consists of principles or rules that
can be applied to novel inputs. It has been claimed that statistical learning and rule learning are separate mechanisms; in this
article, however, we review evidence and provide a unifying perspective that argues for a single statistical-learning mechanism
that accounts for both the learning of input stimuli and the generalization of learned patterns to novel instances. The balance
between instance-learning and generalization is based on two factors: the strength of perceptual and cognitive biases that
highlight structural regularities, and the consistency of elements’ contexts (unique vs. overlapping) in the input.
Introduction:
The problem is that the learner must select the correct structure
in a given set of data from an infinite number of potential
structures, without waiting forever and without the aid of an
instructor who can explain the principles underlying the data
(Chomsky, 1965). Somewhat surprisingly, adults and even
infants are quite good at extracting the organizational structure
of a set of seemingly ambiguous data by merely observing (or
listening to) the input.
Saffran et al. (1996) suggested the term statistical learning to
refer to the process by which learners acquire information about
distributions of elements in the input. Thus, the probability that one syllable followed another
within a word (the transitional probability) was 1.0, whereas
the transitional probability of syllable pairs at word boundaries
was 0.33.
Thus, statistical learning is a powerful and domaingeneral
mechanism available early in development to infants
who are naïve (i.e., uninstructed) about how to negotiate a
complex learning task.
These results show that a statistical-learning mechanism
enables learners to extract one or more statistics and use this
information to make an implicit decision about the stimulus materials
that were present in the input. This ability is important for
learning which syllables form words, for estimating the number
of peaks in a distribution of speech sounds, and for discovering
which visual features form the parts of a scene. But this does not
address the question of how learners form rules—abstractions
about patterns that could be generalized to elements that have
never been seen or heard. How do learners who are exposed to a
subset of the possible patterns in their input go beyond this to
infer a set of general principles or “rules of the game”.
Several studies have documented that infants can make the
inductive leap from observed stimuli to novel stimuli that follow
the same rules.
Some researchers have claimed that statistical learning and
rule learning are two separate mechanisms, because statistical
learning involves learning about elements that have been presented
during exposure, whereas rule learning can be applied
to novel elements and novel combinations (see Endress &
Bonatti, 2007; Marcus, 2000). But why do learners sometimes
keep track of the specific elements in the input they are
exposed to and at other times learn a rule that extends beyond
the specifics of the input? An alternate hypothesis is that these
two processes are in fact not distinct, but rather are different
outcomes of the same learning mechanism.
MyNote:this is the most relevant question they ask in this study.
For example, some stimulus dimensions are naturally more
salient than others. If stimuli are encoded in terms of their
salient dimensions rather than their specific details, then learners
will appear to generalize a rule by applying it to all stimuli
that exhibit the same pattern on these salient dimensions.
MyNote: what triggers encoding in terms of the salient dimensions that apply to all stimuli?
Although perceptual cues can serve as powerful constraints on
statistical learning, perceptual salience is not how most rules
are defined in the natural environment.
They acquire rules when patterns in the input indicate
that several elements occur interchangeably in the same contexts, but acquire specific instances when the patterns
apply only to the individual elements.
MyNote: CRUCIAL POINT: what features of the input indicate that elements occur interchangeably? How much evidence is needed to this end for generalization to new elements to occur?
For example, Xu and
Tenenbaum (2007) have shown that if children hear the word
“glim” applied to three different dogs, they will infer that
“glim” means dog. In contrast, if “glim” is used three times to
refer to the same dog, children interpret it as the dog’s name.
The same contrast between learning items and learning rules
can occur for syllable and word sequences.
Gerken (2006) has made this argument by reconsidering
and modifying the design of the Marcus et al. (1999) rulelearning
experiment (see Fig. 3). Marcus et al. presented 16
different AAB strings in the learning phase of their experiment.
Notice in Figure 3 that four strings ended in di, four
ended in je, four ended in li, and four ended in we. Thus,
infants could have learned the general AAB rule, or they could
have learned a more specific pattern: that every string ended in
di, je, li, or we. The more consistent or reliable cue was the
repetition of the first two syllables—the AAB rule—because it
applied to every string, whereas the “ends in di (or je, or li, or
we)” rule applied to only one-fourth of the strings.
Gerken (2006) asked whether infants presented with a subset
of the 16 strings from the Marcus et al. (1999) study would
favor the “repetition of the first two syllables” rule or the
“ends in di, je, li, or we” rule. Infants who heard only four
AAB strings that ended in the same syllable (e.g., di in the
leftmost column of Fig. 3) were tested on two equally plausible
rules: (1) all strings involve an AAB repetition, and (2) all
strings end in di. These infants failed to generalize the first
rule to a novel string that retained the AAB pattern but did not
end in di. In contrast, infants who heard only four AAB strings
lying along the diagonal in Figure 3 replicated the Marcus
et al. result. Because these strings shared an AAB pattern but
ended in four different syllables, only the AAB rule was
reliable.
MyNote: QUESTION:
Consider the set of 4 strings: leledi, wiwije, jijili, dedewe
The following rules are equally reliable for all strings:
1. AAB
2. starts with 2x le, wi, ji or de
3. ends in di, je, li, we
Why do learners sometimes stick to the narrow generalizations [2,3] and sometimes make a wider generalization (category-based) [1]?
In recent work, we (Reeder, Newport, & Aslin, 2009, 2010)
demonstrated a similar phenomenon—and described some of
the principles for its operation—in the learning of an artificiallanguage
grammar. In our experiments, adult learners were
presented with sentences made up of nonsense words that
came from three different grammatical categories (A, X, and
B), much like subjects, verbs, and direct objects in sentences
such as “Bill ate lunch.” Depending on the experiment, the
input included sentences in which all of the words within a
particular category occurred in the same contexts (e.g., words
X1, X2, and X3 all occurred after any of the A words and before
any of the B words), or the input included only sentences in
which the X words occurred in a limited number of overlapping
A-word or B-word contexts.
Adult learners are surprisingly sensitive to these differences.
Our results showed that participants’ tendency to generalize
depended on the precise degree of overlap among word
contexts that they heard in the input, and also on the consistency
with which a particular A or B word was missing from
possible X-word contexts.
Adults generalize rules when the
shared contexts are largely the same, with only an occasional
absence of overlap (i.e., a “gap”). However, when the gaps are
persistent, adults judge them to be legitimate exceptions to the
rule and no longer generalize to these contexts.
MyNote: this is a broad description of the observed results, but no explanation as to why this is the case, and no precision in describing: “largely”, “persistent” → What is large enough? When is persistent enough? Why?
Thus, similar
to the results of Gerken (2006), our findings showed that it
was the consistency of context cues that led learners to generalize
rules to novel strings, and it was the inconsistency of
context cues that kept learners from generalizing and led them
to treat some strings as exceptions.
The key point here is that in terms of the reliability of context
cues, statistical learning and rule learning are not different
mechanisms (see Orban, Fiser, Aslin, & Lengyel, 2008). When
there are strong perceptual cues, such as the repetition of elements
in an AAB sequence, a statistical-learning mechanism
can compute the regularities of the repetitions (i.e., they are
either present or absent) or of the elements themselves (e.g.,
the particular syllables). And, as hypothesized by Gerken
(2006) and Reeder et al. (2009, 2010), even when there are no
perceptual cues, the consistency of how the context cues are
distributed across strings of input determines whether a rule is
formed—enabling generalization to novel strings—or whether
specific instances are learned. According to this hypothesis,
statistical learning is a single mechanism whose outcome
applies either to elements that have been experienced or to
generalization beyond experienced elements, depending on
the manner and consistency with which elements are patterned
in the learner’s input. Importantly, this balance of learning is
accomplished without instruction, through mere exposure to
structured input.
Conclusion:
Perceptual salience and the patterning of context cues are not
the only factors that can influence what learners acquire via a
statistical-learning mechanism. An extensive literature in linguistics
has argued that languages of the world display a small
number of universal patterns—or a few highly common patterns,
out of many that are possible—and has suggested that
language learners will fail to acquire languages that do not
exhibit these regularities (Chomsky, 1965, 1995).
Recently, a number of studies using
artificial grammars have indeed shown that both children and
adults will more readily acquire languages that observe the
universal or more typologically common patterns found in
natural languages.
For example, Hudson Kam and Newport (2005, 2009) and
Austin and Newport (2011) presented adults and children with
miniature languages containing inconsistent, probabilistically
occurring forms (e.g., nouns were followed by the nonsense
word ka 67% of the time and by the nonsense word po the
remaining 33% of the time). This type of probabilistic variation
is not characteristic of natural languages, but it does occur
in the speech of nonnative speakers who make grammatical
errors. Adult learners in these experiments matched the probabilistic
variation they had heard in their input when they produced
sentences using the miniature language, but young
children formed a regular rule, producing ka virtually all of the
time, thereby restoring to the language the type of regularity
that is more characteristic of natural languages.
It is not always clear why learners acquire certain types of
patterns more easily than others (and why languages therefore
more commonly exhibit these patterns). Some word orders
place prominent words in more consistent positions across different
types of phrases; other patterns are more internally regular
or conform better to the left-to-right biases of auditory
processing. A full understanding of the principles underlying
these learning outcomes awaits further research. What is clear,
however, is that statistical learning is not simply a veridical
reproduction of the stimulus input. Learning is shaped by a
number of constraints on perception and memory, at least
some of which may apply not only to languages but also to
nonlinguistic patterns.