Trace:
Differences
This shows you the differences between two versions of the page.
| Both sides previous revisionPrevious revisionNext revision | Previous revision | ||
| frank_tenenbaum_2011 [2016/02/06 20:04] – silvia | frank_tenenbaum_2011 [2016/02/08 15:48] (current) – silvia | ||
|---|---|---|---|
| Line 322: | Line 322: | ||
| \\ | \\ | ||
| \\ | \\ | ||
| + | |||
| ---- | ---- | ||
| **Conclusions: | **Conclusions: | ||
| + | \\ | ||
| + | \\ | ||
| + | The infant language learning literature has often been | ||
| + | framed around the question ‘‘rules or statistics? | ||
| + | that this is the wrong question. Even if infants represent | ||
| + | symbolic rules with relations like identity—and there | ||
| + | is every reason to believe they do—there is still the question | ||
| + | of how they learn these rules, and how they converge | ||
| + | on the correct rule so quickly in a large hypothesis space. | ||
| + | This challenge requires statistics for guiding generalization | ||
| + | from sparse data. | ||
| + | \\ | ||
| + | from sparse data. | ||
| + | In our work here we have shown how domain-general | ||
| + | statistical inference principles operating over minimal | ||
| + | rule-like representations can explain a broad set of results | ||
| + | in the rule learning literature. | ||
| + | \\ | ||
| + | The inferential principles encoded in our models—the | ||
| + | size principle (or in its more general form, Bayesian Occam’s | ||
| + | razor) and the non-parametric tradeoff between | ||
| + | complexity and fit to data encoded in the Chinese Restaurant | ||
| + | Process—are not only useful in modeling rule learning | ||
| + | within simple artificial languages. They are also the same | ||
| + | principles that are used in computational systems for natural | ||
| + | language processing that are engineered to scale to | ||
| + | large datasets. These principle have been applied to tasks | ||
| + | as varied as unsupervised word segmentation (Brent, | ||
| + | 1999; Goldwater, Griffiths, & Johnson, 2009), morphology | ||
| + | learning (Albright & Hayes, 2003; Goldwater et al., 2006; | ||
| + | Goldsmith, 2001), and grammar induction (Bannard, | ||
| + | Lieven, & Tomasello, 2009; Klein & Manning, 2005; Perfors, | ||
| + | Tenenbaum, & Regier, 2006). | ||
| + | \\ | ||
| + | First, our models assumed the minimal machinery | ||
| + | needed to capture a range of findings. Rather than making | ||
| + | a realistic guess about the structure of the hypothesis | ||
| + | space for rule learning, where evidence was limited we assumed | ||
| + | the simplest possible structure. For example, | ||
| + | although there is some evidence that infants may not always | ||
| + | encode absolute positions (Lewkowicz & Berent, | ||
| + | 2009), there have been few rule learning studies that go | ||
| + | beyond three-element strings. We therefore defined our | ||
| + | rules based on absolute positions in fixed-length strings. | ||
| + | For the same reason, although previous work on adult concept | ||
| + | learning has used infinitely expressive hypothesis | ||
| + | spaces with prior distributions that penalize complexity | ||
| + | (e.g. Goodman, Tenenbaum, Feldman, & Griffiths, 2008; | ||
| + | Kemp, Goodman, & Tenenbaum, 2008), we chose a simple | ||
| + | uniform prior over rules instead. With the collection of | ||
| + | more data from infants, however, we expect that both | ||
| + | more complex hypothesis spaces and priors that prefer | ||
| + | simpler hypotheses will become necessary. | ||
| + | \\ | ||
| + | Second, our models operated over unique string types | ||
| + | as input rather than individual tokens. This assumption | ||
| + | highlights an issue in interpreting the a parameter of Models | ||
| + | 2 and 3: there are likely different processes of forgetting | ||
| + | that happen over types and tokens. While individual tokens | ||
| + | are likely to be forgotten or misperceived with constant | ||
| + | probability, | ||
| + | misremembered or corrupted will grow smaller as more | ||
| + | tokens of that type are observed (Frank et al., 2010). An | ||
| + | interacting issue concerns serial position effects. Depending | ||
| + | on the location of identity regularities within sequences, | ||
| + | rules vary in the ease with which they can be | ||
| + | learned (Endress, Scholl, & Mehler, 2005; Johnson et al., | ||
| + | 2009). Both of these sets of effects could likely be captured | ||
| + | by a better understanding of how limits on memory interact | ||
| + | with the principles underlying rule learning. Although a | ||
| + | model that operates only over types may be appropriate | ||
| + | for experiments in which each type is nearly always heard | ||
| + | the same number of times, models that deal with linguistic | ||
| + | data must include processes that operate over both types | ||
| + | and tokens (Goldwater et al., 2006; Johnson, Griffiths, & | ||
| + | Goldwater, 2007). | ||
| + | \\ | ||
| + | Finally, though the domain-general principles we have | ||
| + | identified here do capture many results, there is some | ||
| + | additional evidence for domain-specific effects. Learners | ||
| + | may acquire expectations for the kinds of regularities that | ||
| + | appear in domains like music compared with those that | ||
| + | appear in speech (Dawson & Gerken, 2009); in addition, a | ||
| + | number of papers have described a striking dissociation | ||
| + | between the kinds of regularities that can be learned from | ||
| + | vowels and those that can be learned from consonants | ||
| + | (Bonatti, Peña, Nespor, & Mehler, 2005; Toro, Nespor, | ||
| + | Mehler, & Bonatti, 2008). Both sets of results point to a | ||
| + | need for a hierarchical approach to rule learning, in which | ||
| + | knowledge of what kinds of regularities are possible in a | ||
| + | domain can itself be learned from the evidence. Only | ||
| + | through further empirical and computational work can | ||
| + | we understand which of these effects can be explained | ||
| + | through acquired domain expectations and which are best | ||
| + | explained as innate domain-specific biases or constraints. | ||
| \\ | \\ | ||