Silvia Rădulescu

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
aslin_newport [2015/11/22 23:16] silviaaslin_newport [2016/02/13 11:03] (current) silvia
Line 1: Line 1:
-====== Statistical Learning: From Acquiring +====== Statistical Learning: From Acquiring Specific Items to Forming General Rules ======
-Specific Items to Forming General Rules ======+
  
 ---- ----
Line 28: Line 27:
 listening to) the input. listening to) the input.
 \\ \\
 +Saffran et al. (1996) suggested the term statistical learning to
 +refer to the process by which learners acquire information about
 +distributions of elements in the input. Thus, the probability that one syllable followed another
 +within a word (the transitional probability) was 1.0, whereas
 +the transitional probability of syllable pairs at word boundaries
 +was 0.33.
 +\\
 +Thus, statistical learning is a powerful and domaingeneral
 +mechanism available early in development to infants
 +who are naïve (i.e., uninstructed) about how to negotiate a
 +complex learning task.
 +These results show that a statistical-learning mechanism
 +enables learners to extract one or more statistics and use this
 +information to make an implicit decision about the stimulus materials
 +that were present in the input. This ability is important for
 +learning which syllables form words, for estimating the number
 +of peaks in a distribution of speech sounds, and for discovering
 +which visual features form the parts of a scene. But this does not
 +address the question of how learners form rules—abstractions
 +about patterns that could be generalized to elements that have
 +never been seen or heard. How do learners who are exposed to a
 +subset of the possible patterns in their input go beyond this to
 +infer a set of general principles or “rules of the game”.
 +\\
 +Several studies have documented that infants can make the
 +inductive leap from observed stimuli to novel stimuli that follow
 +the same rules.
 +\\
 +**Some researchers have claimed that statistical learning and
 +rule learning are two separate mechanisms, because statistical
 +learning involves learning about elements that have been presented
 +during exposure, whereas rule learning can be applied
 +to novel elements and novel combinations** (see Endress &
 +Bonatti, 2007; Marcus, 2000). **But why do learners sometimes
 +keep track of the specific elements in the input they are
 +exposed to and at other times learn a rule that extends beyond
 +the specifics of the input? An alternate hypothesis is that these
 +two processes are in fact not distinct, but rather are different
 +outcomes of the same learning mechanism.**
 +\\
 +\\
 +//MyNote//:this is the most relevant question they ask in this study.
 +\\
 +\\
 +For example, some stimulus dimensions are naturally more
 +salient than others. **If stimuli are encoded in terms of their
 +salient dimensions rather than their specific details, then learners
 +will appear to generalize a rule by applying it to all stimuli
 +that exhibit the same pattern on these salient dimensions.**
 +\\
 +\\
 +//MyNote//: what triggers encoding in terms of the salient dimensions that apply to all stimuli?
 +\\
 +\\
 +Although perceptual cues can serve as powerful constraints on
 +statistical learning, perceptual salience is not how most rules
 +are defined in the natural environment.
 +\\
 +**They acquire rules when patterns in the input indicate
 +that several elements occur interchangeably in the same contexts, but acquire specific instances when the patterns
 +apply only to the individual elements.** 
 +\\
 +\\
 +//MyNote//: **CRUCIAL POINT**: what features of the input indicate that elements occur interchangeably? How much evidence is needed to this end for generalization to new elements to occur?
 +\\
 +\\
 +For example, Xu and
 +Tenenbaum (2007) have shown that if children hear the word
 +“glim” applied to three different dogs, they will infer that
 +“glim” means dog. In contrast, if “glim” is used three times to
 +refer to the same dog, children interpret it as the dog’s name.
 +The same contrast between learning items and learning rules
 +can occur for syllable and word sequences.
 +\\
 +Gerken (2006) has made this argument by reconsidering
 +and modifying the design of the Marcus et al. (1999) rulelearning
 +experiment (see Fig. 3). Marcus et al. presented 16
 +different AAB strings in the learning phase of their experiment.
 +Notice in Figure 3 that four strings ended in di, four
 +ended in je, four ended in li, and four ended in we. Thus,
 +infants could have learned the general AAB rule, or they could
 +have learned a more specific pattern: that every string ended in
 +di, je, li, or we. The more consistent or reliable cue was the
 +repetition of the first two syllables—the AAB rule—because it
 +applied to every string, whereas the “ends in di (or je, or li, or
 +we)” rule applied to only one-fourth of the strings.
 +Gerken (2006) asked whether infants presented with a subset
 +of the 16 strings from the Marcus et al. (1999) study would
 +favor the “repetition of the first two syllables” rule or the
 +“ends in di, je, li, or we” rule. Infants who heard only four
 +AAB strings that ended in the same syllable (e.g., di in the
 +leftmost column of Fig. 3) were tested on two equally plausible
 +rules: (1) all strings involve an AAB repetition, and (2) all
 +strings end in di. These infants failed to generalize the first
 +rule to a novel string that retained the AAB pattern but did not
 +end in di. In contrast, infants who heard only four AAB strings
 +lying along the diagonal in Figure 3 replicated the Marcus
 +et al. result. Because these strings shared an AAB pattern but
 +ended in four different syllables, only the AAB rule was
 +reliable.
 +\\
 +\\
 +//MyNote//: **QUESTION**: 
 +\\
 +Consider the set of 4 strings: //leledi, wiwije, jijili, dedewe//
 +\\
 +The following rules are equally reliable for all strings:
 +\\
 +1. AAB
 +\\
 +2. starts with 2x //le, wi, ji or de//
 +\\
 +3. ends in //di, je, li, we//
 +\\
 +Why do learners sometimes stick to the narrow generalizations [2,3] and sometimes make a wider generalization (category-based) [1]? 
 +\\
 +\\
 +In recent work, we (Reeder, Newport, & Aslin, 2009, 2010)
 +demonstrated a similar phenomenon—and described some of
 +the principles for its operation—in the learning of an artificiallanguage
 +grammar. In our experiments, adult learners were
 +presented with sentences made up of nonsense words that
 +came from three different grammatical categories (A, X, and
 +B), much like subjects, verbs, and direct objects in sentences
 +such as “Bill ate lunch.” Depending on the experiment, the
 +input included sentences in which **all of the words within a
 +particular category occurred in the same contexts** (e.g., words
 +X1, X2, and X3 all occurred after any of the A words and before
 +any of the B words), or **the input included only sentences in
 +which the X words occurred in a limited number of overlapping
 +A-word or B-word contexts**.
 +Adult learners are surprisingly sensitive to these differences.
 +Our results showed that **//participants’ tendency to generalize
 +depended on the precise degree of overlap among word
 +contexts that they heard in the input, and also on the consistency
 +with which a particular A or B word was missing from
 +possible X-word contexts//**. 
 +\\
 +\\
 +**Adults generalize rules when the
 +shared contexts are largely the same, with only an occasional
 +absence of overlap (i.e., a “gap”). However, when the gaps are
 +persistent, adults judge them to be legitimate exceptions to the
 +rule and no longer generalize to these contexts.**
 +\\
 +\\
 +//MyNote//: this is a broad description of the observed results, but no explanation as to why this is the case, and no precision in describing: "largely", "persistent" -> What is large enough? When is persistent enough? Why?
 +\\
 +\\
 +Thus, similar
 +to the results of Gerken (2006), our findings showed that it
 +was the consistency of context cues that led learners to generalize
 +rules to novel strings, and it was the inconsistency of
 +context cues that kept learners from generalizing and led them
 +to treat some strings as exceptions.
 +The key point here is that in terms of the reliability of context
 +cues, statistical learning and rule learning are not different
 +mechanisms (see Orban, Fiser, Aslin, & Lengyel, 2008). When
 +there are strong perceptual cues, such as the repetition of elements
 +in an AAB sequence, a statistical-learning mechanism
 +can compute the regularities of the repetitions (i.e., they are
 +either present or absent) or of the elements themselves (e.g.,
 +the particular syllables). And, as hypothesized by Gerken
 +(2006) and Reeder et al. (2009, 2010), even when there are no
 +perceptual cues, the consistency of how the context cues are
 +distributed across strings of input determines whether a rule is
 +formed—enabling generalization to novel strings—or whether
 +specific instances are learned. According to this hypothesis,
 +statistical learning is a single mechanism whose outcome
 +applies either to elements that have been experienced or to
 +generalization beyond experienced elements, depending on
 +the manner and consistency with which elements are patterned
 +in the learner’s input. Importantly, this balance of learning is
 +accomplished without instruction, through mere exposure to
 +structured input.
  
 +----
 +**Conclusion:**
 +\\
 +Perceptual salience and the patterning of context cues are not
 +the only factors that can influence what learners acquire via a
 +statistical-learning mechanism. An extensive literature in linguistics
 +has argued that languages of the world display a small
 +number of universal patterns—or a few highly common patterns,
 +out of many that are possible—and has suggested that
 +language learners will fail to acquire languages that do not
 +exhibit these regularities (Chomsky, 1965, 1995).
 +Recently, a number of studies using
 +artificial grammars have indeed shown that both children and
 +adults will more readily acquire languages that observe the
 +universal or more typologically common patterns found in
 +natural languages.
 +For example, Hudson Kam and Newport (2005, 2009) and
 +Austin and Newport (2011) presented adults and children with
 +miniature languages containing inconsistent, probabilistically
 +occurring forms (e.g., nouns were followed by the nonsense
 +word ka 67% of the time and by the nonsense word po the
 +remaining 33% of the time). This type of probabilistic variation
 +is not characteristic of natural languages, but it does occur
 +in the speech of nonnative speakers who make grammatical
 +errors. Adult learners in these experiments matched the probabilistic
 +variation they had heard in their input when they produced
 +sentences using the miniature language, but young
 +children formed a regular rule, producing ka virtually all of the
 +time, thereby restoring to the language the type of regularity
 +that is more characteristic of natural languages.
 +\\
 +\\
 +It is not always clear why learners acquire certain types of
 +patterns more easily than others (and why languages therefore
 +more commonly exhibit these patterns). Some word orders
 +place prominent words in more consistent positions across different
 +types of phrases; other patterns are more internally regular
 +or conform better to the left-to-right biases of auditory
 +processing. A full understanding of the principles underlying
 +these learning outcomes awaits further research. What is clear,
 +however, is that statistical learning is not simply a veridical
 +reproduction of the stimulus input. Learning is shaped by a
 +number of constraints on perception and memory, at least
 +some of which may apply not only to languages but also to
 +nonlinguistic patterns.