Skip to content

Bayesian Number Game

This is an interactive demonstration of Tenenbaum's (1999, 2001) Bayesian model of concept learning, using the "number game" as an example. The model illustrates how people generalize from examples to infer the extension of a concept.

Observations:16
Likelihood:
Examples:
P(y ∈ C | X)Probability each number is in the concept
1.00.501102030405060708090100
Most likely hypotheses
P(h|X)
powers of 433.5% powers of 216.8% square numbers10.1% multiples of 86.3% ending in 65.0% [16, 17]5.0% [16, 18]3.4% multiples of 43.0% [11, 20]2.5% [16, 19]2.5% even numbers2.0% [11, 16]1.7% [16, 21]1.7% multiples of 21.5% [1, 20]1.3% [11, 30]1.3% [11, 18]1.3% [16, 23]1.3% powers of 30.0% cube numbers0.0% multiples of 30.0% multiples of 50.0% multiples of 60.0% multiples of 70.0% multiples of 90.0% multiples of 100.0% multiples of 120.0% odd numbers0.0% ending in 00.0% ending in 10.0% ending in 20.0% ending in 30.0% ending in 40.0% ending in 50.0% ending in 70.0% ending in 80.0% ending in 90.0% prime numbers0.0% fibonacci numbers0.0% [1, 5]0.0% [1, 10]0.0% [11, 15]0.0% [21, 25]0.0% [21, 30]0.0% [21, 40]0.0% [31, 35]0.0% [31, 40]0.0% [31, 50]0.0% [41, 45]0.0% [41, 50]0.0%
Likelihood:
P(X|h) = (1/|h|)n — Size principle: smaller hypotheses more likely
P(X|h) = 1 — Uniform: all consistent hypotheses equally likely
Posterior:P(h|X) ∝ P(X|h) · P(h)
Marginal:P(y ∈ C|X) = Σh P(y ∈ h) · P(h|X)

The Model

Given a set of observed positive examples X = {x₁, x₂, ..., xₙ}, the model infers which hypothesis h best explains the data, and uses this to predict whether new numbers belong to the concept.

Hypotheses

The model considers many hypotheses about what rule generates the numbers:

  • Mathematical rules: powers of 2, powers of 3, square numbers, primes, etc.
  • Multiples: multiples of 2, 3, 4, 5, ..., 12
  • Even/odd: all even or all odd numbers
  • Ending patterns: numbers ending in 0, 1, 2, ..., 9
  • Intervals: ranges like [10, 20], [15, 25], etc.

The Size Principle

The key insight is the size principle for likelihood:

P(X|h)=(1|h|)n if all xX are in h

This means:

  • Smaller hypotheses that are consistent with the data get higher likelihood
  • With more examples, this preference for smaller hypotheses grows exponentially

For example, if you see the number 16:

  • "Powers of 2" (size 6) has likelihood 1/6 ≈ 0.17
  • "Even numbers" (size 50) has likelihood 1/50 = 0.02

But if you see 16, 8, 2, and 64:

  • "Powers of 2" has likelihood (1/6)⁴ ≈ 0.00077
  • "Even numbers" has likelihood (1/50)⁴ = 0.00000016

The smaller hypothesis wins by a much larger margin with more data.

Posterior & Generalization

Using Bayes' rule:

P(h|X)P(X|h)P(h)

The marginal probability that a new number y is in the concept:

P(yC|X)=hP(yh)P(h|X)

Try These Examples

Example 1: The number 16

Enter just "16". Notice that many hypotheses are consistent: powers of 2, powers of 4, multiples of 4, even numbers, etc. The histogram shows broad generalization.

Example 2: Powers of 2

Enter "16, 8, 2, 64". Now "powers of 2" dominates, and the model predicts only 4 and 32 (the other powers of 2 up to 100) are likely to be in the concept.

Example 3: Powers of 4

Enter "4, 16, 64". The model strongly favors "powers of 4" over "powers of 2" because it's a smaller consistent hypothesis.

Example 4: An interval

Enter "16, 23, 19, 20". No mathematical rule fits well, so interval hypotheses like [16, 23] dominate. The model generalizes to nearby numbers.

Example 5: Squares

Enter "81, 25, 4". The model infers "square numbers" and predicts 1, 9, 16, 36, 49, 64, and 100 are likely in the concept.

How to Use

  1. Enter numbers in the input field, separated by commas or spaces
  2. Click example buttons to try preset demonstrations
  3. Click bars in the histogram to add numbers as observations
  4. Click number chips to remove observations
  5. Watch how the posterior distribution and generalization histogram update

Key Insights

  1. Suspicious coincidences: If all examples happen to be powers of 2, that's unlikely to be a coincidence—the concept is probably "powers of 2"

  2. Size matters: The model prefers the smallest hypothesis consistent with the data

  3. More data, sharper inference: With more examples, the posterior concentrates on fewer hypotheses

  4. Rational generalization: The model captures human-like generalization patterns, neither too narrow nor too broad

References

Tenenbaum, J. B. (1999). A Bayesian framework for concept learning. PhD Thesis, MIT.

Tenenbaum, J. B., & Griffiths, T. L. (2001). Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences, 24(4), 629-640.