Aslin newport

This is an old revision of the document!

====== Statistical Learning: From Acquiring Specific Items to Forming General Rules

Abstract
Statistical learning is a rapid and robust mechanism that enables adults and infants to extract patterns embedded in both language and visual domains. Statistical learning operates implicitly, without instruction, through mere exposure to a set of input stimuli. However, much of what learners must acquire about a structured domain consists of principles or rules that can be applied to novel inputs. It has been claimed that statistical learning and rule learning are separate mechanisms; in this article, however, we review evidence and provide a unifying perspective that argues for a single statistical-learning mechanism that accounts for both the learning of input stimuli and the generalization of learned patterns to novel instances. The balance between instance-learning and generalization is based on two factors: the strength of perceptual and cognitive biases that highlight structural regularities, and the consistency of elements’ contexts (unique vs. overlapping) in the input.

Introduction:
The problem is that the learner must select the correct structure in a given set of data from an infinite number of potential structures, without waiting forever and without the aid of an instructor who can explain the principles underlying the data (Chomsky, 1965). Somewhat surprisingly, adults and even infants are quite good at extracting the organizational structure of a set of seemingly ambiguous data by merely observing (or listening to) the input.
Saffran et al. (1996) suggested the term statistical learning to refer to the process by which learners acquire information about distributions of elements in the input. Thus, the probability that one syllable followed another within a word (the transitional probability) was 1.0, whereas the transitional probability of syllable pairs at word boundaries was 0.33.
Thus, statistical learning is a powerful and domaingeneral mechanism available early in development to infants who are naïve (i.e., uninstructed) about how to negotiate a complex learning task. These results show that a statistical-learning mechanism enables learners to extract one or more statistics and use this information to make an implicit decision about the stimulus materials that were present in the input. This ability is important for learning which syllables form words, for estimating the number of peaks in a distribution of speech sounds, and for discovering which visual features form the parts of a scene. But this does not address the question of how learners form rules—abstractions about patterns that could be generalized to elements that have never been seen or heard. How do learners who are exposed to a subset of the possible patterns in their input go beyond this to infer a set of general principles or “rules of the game”.
Several studies have documented that infants can make the inductive leap from observed stimuli to novel stimuli that follow the same rules.
Some researchers have claimed that statistical learning and rule learning are two separate mechanisms, because statistical learning involves learning about elements that have been presented during exposure, whereas rule learning can be applied to novel elements and novel combinations (see Endress & Bonatti, 2007; Marcus, 2000). But why do learners sometimes keep track of the specific elements in the input they are exposed to and at other times learn a rule that extends beyond the specifics of the input? An alternate hypothesis is that these two processes are in fact not distinct, but rather are different outcomes of the same learning mechanism.
For example, some stimulus dimensions are naturally more salient than others. If stimuli are encoded in terms of their salient dimensions rather than their specific details, then learners will appear to generalize a rule by applying it to all stimuli that exhibit the same pattern on these salient dimensions.
Although perceptual cues can serve as powerful constraints on statistical learning, perceptual salience is not how most rules are defined in the natural environment.
They acquire rules when patterns in the input indicate that several elements occur interchangeably in the same contexts, but acquire specific instances when the patterns apply only to the individual elements. For example, Xu and Tenenbaum (2007) have shown that if children hear the word “glim” applied to three different dogs, they will infer that “glim” means dog. In contrast, if “glim” is used three times to refer to the same dog, children interpret it as the dog’s name. The same contrast between learning items and learning rules can occur for syllable and word sequences.
Gerken (2006) has made this argument by reconsidering and modifying the design of the Marcus et al. (1999) rulelearning experiment (see Fig. 3). Marcus et al. presented 16 different AAB strings in the learning phase of their experiment. Notice in Figure 3 that four strings ended in di, four ended in je, four ended in li, and four ended in we. Thus, infants could have learned the general AAB rule, or they could have learned a more specific pattern: that every string ended in di, je, li, or we. The more consistent or reliable cue was the repetition of the first two syllables—the AAB rule—because it applied to every string, whereas the “ends in di (or je, or li, or we)” rule applied to only one-fourth of the strings. Gerken (2006) asked whether infants presented with a subset of the 16 strings from the Marcus et al. (1999) study would favor the “repetition of the first two syllables” rule or the “ends in di, je, li, or we” rule. Infants who heard only four AAB strings that ended in the same syllable (e.g., di in the leftmost column of Fig. 3) were tested on two equally plausible rules: (1) all strings involve an AAB repetition, and (2) all strings end in di. These infants failed to generalize the first rule to a novel string that retained the AAB pattern but did not end in di. In contrast, infants who heard only four AAB strings lying along the diagonal in Figure 3 replicated the Marcus et al. result. Because these strings shared an AAB pattern but ended in four different syllables, only the AAB rule was reliable. In recent work, we (Reeder, Newport, & Aslin, 2009, 2010) demonstrated a similar phenomenon—and described some of the principles for its operation—in the learning of an artificiallanguage grammar. In our experiments, adult learners were presented with sentences made up of nonsense words that came from three different grammatical categories (A, X, and B), much like subjects, verbs, and direct objects in sentences such as “Bill ate lunch.” Depending on the experiment, the input included sentences in which all of the words within a particular category occurred in the same contexts (e.g., words X1, X2, and X3 all occurred after any of the A words and before any of the B words), or the input included only sentences in which the X words occurred in a limited number of overlapping A-word or B-word contexts. Adult learners are surprisingly sensitive to these differences. Our results showed that participants’ tendency to generalize depended on the precise degree of overlap among word contexts that they heard in the input, and also on the consistency with which a particular A or B word was missing from possible X-word contexts. Adults generalize rules when the shared contexts are largely the same, with only an occasional absence of overlap (i.e., a “gap”). However, when the gaps are persistent, adults judge them to be legitimate exceptions to the rule and no longer generalize to these contexts. Thus, similar to the results of Gerken (2006), our findings showed that it was the consistency of context cues that led learners to generalize rules to novel strings, and it was the inconsistency of context cues that kept learners from generalizing and led them to treat some strings as exceptions. The key point here is that in terms of the reliability of context cues, statistical learning and rule learning are not different mechanisms (see Orban, Fiser, Aslin, & Lengyel, 2008). When there are strong perceptual cues, such as the repetition of elements in an AAB sequence, a statistical-learning mechanism can compute the regularities of the repetitions (i.e., they are either present or absent) or of the elements themselves (e.g., the particular syllables). And, as hypothesized by Gerken (2006) and Reeder et al. (2009, 2010), even when there are no perceptual cues, the consistency of how the context cues are distributed across strings of input determines whether a rule is formed—enabling generalization to novel strings—or whether specific instances are learned. According to this hypothesis, statistical learning is a single mechanism whose outcome applies either to elements that have been experienced or to generalization beyond experienced elements, depending on the manner and consistency with which elements are patterned in the learner’s input. Importantly, this balance of learning is accomplished without instruction, through mere exposure to structured input.