Differences

This shows you the differences between two versions of the page.

--- frank_tenenbaum_2011 [2016/02/06 20:03] – silvia
+++ frank_tenenbaum_2011 [2016/02/08 15:48] (current) – silvia
@@ Line 319: / Line 319: @@
 \\
 \\
-{{ ::screenshot_2016-02-06_21.03.21.png?nolink&300 |}}
+{{ :screenshot_2016-02-06_21.03.21.png?nolink |}}
 \\
 \\
 ----
 **Conclusions:**
+\\
+\\
+The infant language learning literature has often been
+framed around the question ‘‘rules or statistics?’’ We suggest
+that this is the wrong question. Even if infants represent
+symbolic rules with relations like identity—and there
+is every reason to believe they do—there is still the question
+of how they learn these rules, and how they converge
+on the correct rule so quickly in a large hypothesis space.
+This challenge requires statistics for guiding generalization
+from sparse data.
+\\
+from sparse data.
+In our work here we have shown how domain-general
+statistical inference principles operating over minimal
+rule-like representations can explain a broad set of results
+in the rule learning literature.
+\\
+The inferential principles encoded in our models—the
+size principle (or in its more general form, Bayesian Occam’s
+razor) and the non-parametric tradeoff between
+complexity and fit to data encoded in the Chinese Restaurant
+Process—are not only useful in modeling rule learning
+within simple artificial languages. They are also the same
+principles that are used in computational systems for natural
+language processing that are engineered to scale to
+large datasets. These principle have been applied to tasks
+as varied as unsupervised word segmentation (Brent,
+; Goldwater, Griffiths, & Johnson, 2009), morphology
+learning (Albright & Hayes, 2003; Goldwater et al., 2006;
+Goldsmith, 2001), and grammar induction (Bannard,
+Lieven, & Tomasello, 2009; Klein & Manning, 2005; Perfors,
+Tenenbaum, & Regier, 2006).
+\\
+First, our models assumed the minimal machinery
+needed to capture a range of findings. Rather than making
+a realistic guess about the structure of the hypothesis
+space for rule learning, where evidence was limited we assumed
+the simplest possible structure. For example,
+although there is some evidence that infants may not always
+encode absolute positions (Lewkowicz & Berent,
+), there have been few rule learning studies that go
+beyond three-element strings. We therefore defined our
+rules based on absolute positions in fixed-length strings.
+For the same reason, although previous work on adult concept
+learning has used infinitely expressive hypothesis
+spaces with prior distributions that penalize complexity
+(e.g. Goodman, Tenenbaum, Feldman, & Griffiths, 2008;
+Kemp, Goodman, & Tenenbaum, 2008), we chose a simple
+uniform prior over rules instead. With the collection of
+more data from infants, however, we expect that both
+more complex hypothesis spaces and priors that prefer
+simpler hypotheses will become necessary.
+\\
+Second, our models operated over unique string types
+as input rather than individual tokens. This assumption
+highlights an issue in interpreting the a parameter of Models
+and 3: there are likely different processes of forgetting
+that happen over types and tokens. While individual tokens
+are likely to be forgotten or misperceived with constant
+probability, the probability of a type being
+misremembered or corrupted will grow smaller as more
+tokens of that type are observed (Frank et al., 2010). An
+interacting issue concerns serial position effects. Depending
+on the location of identity regularities within sequences,
+rules vary in the ease with which they can be
+learned (Endress, Scholl, & Mehler, 2005; Johnson et al.,
+). Both of these sets of effects could likely be captured
+by a better understanding of how limits on memory interact
+with the principles underlying rule learning. Although a
+model that operates only over types may be appropriate
+for experiments in which each type is nearly always heard
+the same number of times, models that deal with linguistic
+data must include processes that operate over both types
+and tokens (Goldwater et al., 2006; Johnson, Griffiths, &
+Goldwater, 2007).
+\\
+Finally, though the domain-general principles we have
+identified here do capture many results, there is some
+additional evidence for domain-specific effects. Learners
+may acquire expectations for the kinds of regularities that
+appear in domains like music compared with those that
+appear in speech (Dawson & Gerken, 2009); in addition, a
+number of papers have described a striking dissociation
+between the kinds of regularities that can be learned from
+vowels and those that can be learned from consonants
+(Bonatti, Peña, Nespor, & Mehler, 2005; Toro, Nespor,
+Mehler, & Bonatti, 2008). Both sets of results point to a
+need for a hierarchical approach to rule learning, in which
+knowledge of what kinds of regularities are possible in a
+domain can itself be learned from the evidence. Only
+through further empirical and computational work can
+we understand which of these effects can be explained
+through acquired domain expectations and which are best
+explained as innate domain-specific biases or constraints.
 \\