Silvia Rădulescu

Trace:

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revisionPrevious revision
Next revision
Previous revision
frank_tenenbaum_2011 [2016/02/06 20:03] silviafrank_tenenbaum_2011 [2016/02/08 15:48] (current) silvia
Line 319: Line 319:
 \\ \\
 \\ \\
-{{ ::screenshot_2016-02-06_21.03.21.png?nolink&300 |}}+{{ :screenshot_2016-02-06_21.03.21.png?nolink |}}
 \\ \\
 \\ \\
 +
 ---- ----
 **Conclusions:** **Conclusions:**
 +\\
 +\\
 +The infant language learning literature has often been
 +framed around the question ‘‘rules or statistics?’’ We suggest
 +that this is the wrong question. Even if infants represent
 +symbolic rules with relations like identity—and there
 +is every reason to believe they do—there is still the question
 +of how they learn these rules, and how they converge
 +on the correct rule so quickly in a large hypothesis space.
 +This challenge requires statistics for guiding generalization
 +from sparse data.
 +\\
 +from sparse data.
 +In our work here we have shown how domain-general
 +statistical inference principles operating over minimal
 +rule-like representations can explain a broad set of results
 +in the rule learning literature.
 +\\
 +The inferential principles encoded in our models—the
 +size principle (or in its more general form, Bayesian Occam’s
 +razor) and the non-parametric tradeoff between
 +complexity and fit to data encoded in the Chinese Restaurant
 +Process—are not only useful in modeling rule learning
 +within simple artificial languages. They are also the same
 +principles that are used in computational systems for natural
 +language processing that are engineered to scale to
 +large datasets. These principle have been applied to tasks
 +as varied as unsupervised word segmentation (Brent,
 +1999; Goldwater, Griffiths, & Johnson, 2009), morphology
 +learning (Albright & Hayes, 2003; Goldwater et al., 2006;
 +Goldsmith, 2001), and grammar induction (Bannard,
 +Lieven, & Tomasello, 2009; Klein & Manning, 2005; Perfors,
 +Tenenbaum, & Regier, 2006).
 +\\
 +First, our models assumed the minimal machinery
 +needed to capture a range of findings. Rather than making
 +a realistic guess about the structure of the hypothesis
 +space for rule learning, where evidence was limited we assumed
 +the simplest possible structure. For example,
 +although there is some evidence that infants may not always
 +encode absolute positions (Lewkowicz & Berent,
 +2009), there have been few rule learning studies that go
 +beyond three-element strings. We therefore defined our
 +rules based on absolute positions in fixed-length strings.
 +For the same reason, although previous work on adult concept
 +learning has used infinitely expressive hypothesis
 +spaces with prior distributions that penalize complexity
 +(e.g. Goodman, Tenenbaum, Feldman, & Griffiths, 2008;
 +Kemp, Goodman, & Tenenbaum, 2008), we chose a simple
 +uniform prior over rules instead. With the collection of
 +more data from infants, however, we expect that both
 +more complex hypothesis spaces and priors that prefer
 +simpler hypotheses will become necessary.
 +\\
 +Second, our models operated over unique string types
 +as input rather than individual tokens. This assumption
 +highlights an issue in interpreting the a parameter of Models
 +2 and 3: there are likely different processes of forgetting
 +that happen over types and tokens. While individual tokens
 +are likely to be forgotten or misperceived with constant
 +probability, the probability of a type being
 +misremembered or corrupted will grow smaller as more
 +tokens of that type are observed (Frank et al., 2010). An
 +interacting issue concerns serial position effects. Depending
 +on the location of identity regularities within sequences,
 +rules vary in the ease with which they can be
 +learned (Endress, Scholl, & Mehler, 2005; Johnson et al.,
 +2009). Both of these sets of effects could likely be captured
 +by a better understanding of how limits on memory interact
 +with the principles underlying rule learning. Although a
 +model that operates only over types may be appropriate
 +for experiments in which each type is nearly always heard
 +the same number of times, models that deal with linguistic
 +data must include processes that operate over both types
 +and tokens (Goldwater et al., 2006; Johnson, Griffiths, &
 +Goldwater, 2007).
 +\\
 +Finally, though the domain-general principles we have
 +identified here do capture many results, there is some
 +additional evidence for domain-specific effects. Learners
 +may acquire expectations for the kinds of regularities that
 +appear in domains like music compared with those that
 +appear in speech (Dawson & Gerken, 2009); in addition, a
 +number of papers have described a striking dissociation
 +between the kinds of regularities that can be learned from
 +vowels and those that can be learned from consonants
 +(Bonatti, Peña, Nespor, & Mehler, 2005; Toro, Nespor,
 +Mehler, & Bonatti, 2008). Both sets of results point to a
 +need for a hierarchical approach to rule learning, in which
 +knowledge of what kinds of regularities are possible in a
 +domain can itself be learned from the evidence. Only
 +through further empirical and computational work can
 +we understand which of these effects can be explained
 +through acquired domain expectations and which are best
 +explained as innate domain-specific biases or constraints.
 \\ \\