World Models

Empowering agents to dream, predict, and master complex environments through internal simulation.

What is a World Model?

A World Model is a representation of an environment that an AI agent builds to understand the consequences of its actions. Instead of just reacting to raw sensory data, the agent learns a compressed representation of the world and uses it to "dream" or simulate future scenarios.

The conceptual foundation of modern World Models traces back to 1943, when Kenneth Craik proposed in his book, "The Nature of Explanation", that the brain constructs "small-scale models" of reality to anticipate events and make decisions. Today's AI architectures are the digital realization of this visionary idea.

Environment Vision Model (VAE Encoder) Memory Model (RNN / MDN) Controller (Action Policy) Latent State (z) Action (a)

The Vision Model

Acts as the sensory system. It compresses high-dimensional input (like pixels) into a compact latent vector z, capturing the essence of the current scene.

The Memory Model

Predicts the future. Based on past experiences and current state, it forecasts what the next latent state will be, allowing the agent to "hallucinate" consequences.

The Controller

The decision maker. It uses the representations from Vision and Memory to choose the best action to maximize rewards in the environment.

Why it Matters

Sample Efficiency: By training in its own "dream space," an agent can learn much faster than by interacting with the real world alone.

Robustness: Agents with world models can handle noisy environments better by relying on their internal physics-like understanding.

Transfer Learning: A world model learned in one task can often be repurposed for another, much like how humans use their understanding of gravity across different sports.

Existing World Models

Several breakthrough architectures have brought Kenneth Craik's vision to life:

World Models (2018)

Ha & Schmidhuber's seminal paper which introduced the V, M, C architecture using VAEs and RNNs.

Dreamer (V1-V3)

Google DeepMind's series of models that learn purely from latent imagination to master complex tasks.

I-JEPA / V-JEPA

Meta AI's Joint-Embedding Predictive Architecture that learns by predicting missing parts of the world.

Sora / Gen-3

Modern video generation models that serve as general-purpose simulators of the physical world.

Interactive "Dreaming" Demo

Move the agent (blue) with your mouse. The World Model (purple) predicts where the agent will be in the future based on its current momentum.

Historical Context

Modern research into World Models (like those by Ha and Schmidhuber) directly implements the hypothesis laid out by Kenneth Craik. In his 1943 work, Craik argued that if the organism carries a "small-scale model" of external reality and of its own possible actions within its head, it can try out various alternatives, conclude which is the best of them, react to future situations before they arise, and utilize the knowledge of past events in dealing with the present and the future.