Trace: • frank_tenenbaum_2011
Frank tenenbaum 2011
This is an old revision of the document!
Three ideal observer models for rule learning in simple languages
Abstract
The phenomenon
of ‘‘rule learning’’—quick learning of abstract regularities from exposure to a limited set of
stimuli—has become an important model system for understanding generalization in
infancy. Experiments with adults and children have revealed differences in performance
across domains and types of rules. To understand the representational and inferential
assumptions necessary to capture this broad set of results, we introduce three ideal observer
models for rule learning. Each model builds on the next, allowing us to test the consequences
of individual assumptions. Model 1 learns a single rule, Model 2 learns a single
rule from noisy input, and Model 3 learns multiple rules from noisy input
Introduction:
1. Introduction: from ‘‘rules vs. statistics’’ to statistics
over rules
A central debate in the study of language acquisition
concerns the mechanisms by which human infants learn
the structure of their first language. Are structural aspects
of language learned using constrained, domain-specific
mechanisms (Chomsky, 1981; Pinker, 1991), or is this
learning accomplished using more general mechanisms
of statistical inference (Elman et al., 1996; Tomasello,
2003)?
Subsequent studies of rule learning in language acquisition
have addressed all of these questions, but for the most
part have collapsed them into a single dichotomy of ‘‘rules
vs. statistics’’ (Seidenberg & Elman, 1999). The poles of
‘‘rules’’ and ‘‘statistics’’ are seen as accounts of both how
infants represent their knowledge of language (in explicit
symbolic ‘‘rules’’ or implicit ‘‘statistical’’ associations) as
well as which inferential mechanisms are used to induce
their knowledge from limited data (qualitative heuristic
‘‘rules’’ or quantitative ‘‘statistical’’ inference engines). Formal
computational models have focused primarily on the
‘‘statistical’’ pole: for example, neural network models designed
to show that the identity relationships present in
ABA-type rules can be captured without explicit rules,
as statistical associations between perceptual inputs across
time (Altmann, 2002; Christiansen & Curtin, 1999;
Dominey & Ramus, 2000; Marcus, 1999; Negishi, 1999;
Shastri, 1999; Shultz, 1999, but c.f. Kuehne, Gentner, &
Forbus, 2000).
We believe the simple ‘‘rules vs. statistics’’ debate in
language acquisition needs to be expanded, or perhaps
exploded. On empirical grounds, there is support for both
the availability of rule-like representations and the ability
of learners to perform statistical inferences over these
representations. Abstract, rule-like representations are
implied by findings that infants are able to recognize
identity relationships (Tyrell, Stauffer, & Snowman,
1991; Tyrell, Zingaro, & Minard, 1993) and even newborns
have differential brain responses to exact repetitions
(Gervain, Macagno, Cogoi, Peña, & Mehler, 2008).
Learners are also able to make statistical inferences about
which rule to learn. For example, infants may have a preference
towards parsimony or specificity in deciding between
competing generalizations: when presented with
stimuli that were consistent with both an AAB rule and
also a more specific rule, AA di (where the last syllable
was constrained to be the syllable di), infants preferred
the narrower generalization (Gerken, 2006, 2010). Following
the Bayesian framework for generalization proposed
by Tenenbaum and Griffiths (2001), Gerken suggests that
these preferences can be characterized as the products of
rational statistical inference.
On theoretical grounds, we see neither a pure ‘‘rules’’
position nor a pure ‘‘statistics’’ position as sustainable or
satisfying. Without principled statistical inference mechanisms,
the pure ‘‘rules’’ camp has difficulty explaining
which rules are learned or why the right rules are learned
from the observed data. Without explicit rule-based representations,
the pure ‘‘statistics’’ camp has difficulty accounting for what is actually learned; the best neural
network models of language have so far not come close
to capturing the expressive compositional structure of language,
which is why symbolic representations continue to
be the basis for almost all state-of-the-art work in natural
language processing (Chater & Manning, 2006; Manning &
Schütze, 2000).
Driven by these empirical and theoretical considerations,
our work here explores a proposal for how concepts
of ‘‘rules’’ and ‘‘statistics’’ can interact more deeply in
understanding the phenomena of ‘‘rule learning’’ in human
language acquisition.
Our approach is to create computational
models that perform statistical inference over rulebased
representations and test these models on their fit
to the broadest possible set of empirical results. The success
of these models in capturing human performance
across a wide range of experiments lends support to the
idea that statistical inferences over rule-based representations
may capture something important about what human
learners are doing in these tasks.
Our models are ideal observer models: they provide a
description of the learning problem and show what the
correct inference would be, under a given set of assumptions.
The ideal observer approach has a long history in
the study of perception and is typically used for understanding
the ways in which performance conforms to or
deviates from the ideal (Geisler, 2003).
With few exceptions (Dawson & Gerken, 2009; Johnson
et al., 2009), empirical work on rule learning has been
geared towards showing what infants can do, rather than
providing a detailed pattern of successes and failures
across ages.
Models
Conclusions: