This is an old revision of the document!

Three ideal observer models for rule learning in simple languages

Abstract
The phenomenon of ‘‘rule learning’’—quick learning of abstract regularities from exposure to a limited set of stimuli—has become an important model system for understanding generalization in infancy. Experiments with adults and children have revealed differences in performance across domains and types of rules. To understand the representational and inferential assumptions necessary to capture this broad set of results, we introduce three ideal observer models for rule learning. Each model builds on the next, allowing us to test the consequences of individual assumptions. Model 1 learns a single rule, Model 2 learns a single rule from noisy input, and Model 3 learns multiple rules from noisy input

Introduction:
1. Introduction: from ‘‘rules vs. statistics’’ to statistics over rules
A central debate in the study of language acquisition concerns the mechanisms by which human infants learn the structure of their first language. Are structural aspects of language learned using constrained, domain-specific mechanisms (Chomsky, 1981; Pinker, 1991), or is this learning accomplished using more general mechanisms of statistical inference (Elman et al., 1996; Tomasello, 2003)?
Subsequent studies of rule learning in language acquisition have addressed all of these questions, but for the most part have collapsed them into a single dichotomy of ‘‘rules vs. statistics’’ (Seidenberg & Elman, 1999). The poles of ‘‘rules’’ and ‘‘statistics’’ are seen as accounts of both how infants represent their knowledge of language (in explicit symbolic ‘‘rules’’ or implicit ‘‘statistical’’ associations) as well as which inferential mechanisms are used to induce their knowledge from limited data (qualitative heuristic ‘‘rules’’ or quantitative ‘‘statistical’’ inference engines). Formal computational models have focused primarily on the ‘‘statistical’’ pole: for example, neural network models designed to show that the identity relationships present in ABA-type rules can be captured without explicit rules, as statistical associations between perceptual inputs across time (Altmann, 2002; Christiansen & Curtin, 1999; Dominey & Ramus, 2000; Marcus, 1999; Negishi, 1999; Shastri, 1999; Shultz, 1999, but c.f. Kuehne, Gentner, & Forbus, 2000).
We believe the simple ‘‘rules vs. statistics’’ debate in language acquisition needs to be expanded, or perhaps exploded. On empirical grounds, there is support for both the availability of rule-like representations and the ability of learners to perform statistical inferences over these representations. Abstract, rule-like representations are implied by findings that infants are able to recognize identity relationships (Tyrell, Stauffer, & Snowman, 1991; Tyrell, Zingaro, & Minard, 1993) and even newborns have differential brain responses to exact repetitions (Gervain, Macagno, Cogoi, Peña, & Mehler, 2008).
Learners are also able to make statistical inferences about which rule to learn. For example, infants may have a preference towards parsimony or specificity in deciding between competing generalizations: when presented with stimuli that were consistent with both an AAB rule and also a more specific rule, AA di (where the last syllable was constrained to be the syllable di), infants preferred the narrower generalization (Gerken, 2006, 2010). Following the Bayesian framework for generalization proposed by Tenenbaum and Griffiths (2001), Gerken suggests that these preferences can be characterized as the products of rational statistical inference.

Conclusions: