How AI learns an internal simulator of the world — predicting what happens next, imagining consequences, and planning without trial and error.
Before you catch a ball, your brain simulates its trajectory. You don't need to try every possible arm position — you predict where the ball will be and move accordingly. This internal simulation is a world model.
An AI world model does the same thing: given the current state and an action, it predicts what the next state will be. If the model is accurate enough, the agent can plan in imagination instead of learning through costly real-world trial and error.
The teal ball follows real physics. The orange trails are imagined futures from the world model. Click to launch a new ball.
In model-free RL, the agent learns a policy directly from experience: try things, get rewards, adjust. In model-based RL, the agent first learns a dynamics model (how the world works), then uses that model to plan or generate synthetic experience.
Model-based methods learn faster because they extract more from each real experience. Adjust the model accuracy to see the trade-off.
Predicting the next image pixel-by-pixel is expensive and wasteful — most pixels don't matter for decision-making. Instead, modern world models work in a latent space: they encode observations into compact representations and predict dynamics in that compressed space.
Watch how a high-dimensional observation (left) is compressed into a small latent vector (middle), and prediction happens there. Toggle between pixel and latent prediction.
Dreamer is the most successful family of world-model agents. The architecture has three components: a world model (RSSM), an actor (policy), and a critic (value function). The key innovation: the actor and critic are trained entirely inside the world model's imagination.
| Version | Year | Key Improvement |
|---|---|---|
| Dreamer v1 | 2020 | Latent imagination + value estimation |
| Dreamer v2 | 2021 | Discrete latents, KL balancing, Atari mastery |
| Dreamer v3 (DreamerV3) | 2023 | Symlog predictions, fixed hyperparams across domains |
The agent imagines future states from the current state. Teal = real states, orange = imagined futures, green = rewards predicted.
Yann LeCun proposed JEPA as an alternative to generative world models. Instead of predicting what the next observation looks like (pixel reconstruction), JEPA predicts the abstract representation of the next state. This avoids wasting capacity on irrelevant details.
The key difference from autoencoders: JEPA never reconstructs pixels. Both the target and prediction live in embedding space. A VICReg or similar loss prevents the embeddings from collapsing to trivial solutions.
Compare: generative models predict pixels (expensive, noisy), JEPA predicts embeddings (cheap, abstract).
The latest world models don't just predict latent states — they generate entire videos of what will happen next. This is world modeling at scale: train on millions of internet videos and learn a general-purpose simulator of the visual world.
| Model | Key Idea | Training Data |
|---|---|---|
| Genie (DeepMind) | Learn actions from unlabeled video, playable worlds | Internet gameplay videos |
| UniSim (Google) | Universal simulator of visual experience | Internet video + images |
| Sora (OpenAI) | Diffusion transformer for video, implicit physics | Internet video |
| Cosmos (NVIDIA) | World foundation model for physical AI | Driving + robotics video |
Given the current frame and an action, the model predicts future frames. Watch how prediction quality degrades over longer horizons.
Having a world model is only useful if you can use it to make decisions. Planning algorithms search through imagined futures to find the best action sequence. Major approaches:
| Method | How It Plans | Used In |
|---|---|---|
| Random Shooting | Sample many action sequences, pick best | PETS |
| CEM (Cross-Entropy) | Iteratively refine action distribution | PETS, TD-MPC |
| MCTS (Tree Search) | Build a search tree of states | MuZero, EfficientZero |
| Backprop through model | Gradient-based trajectory optimization | Dreamer |
The agent imagines many possible futures (gray) and picks the one with the highest reward (green). Click to re-plan.
World models are powerful but far from solved. Key challenges remain:
| Problem | Why It's Hard | Current Approaches |
|---|---|---|
| Compounding errors | Prediction errors accumulate over long rollouts | Shorter horizons, latent space, ensembles |
| Partial observability | The agent can't see everything | Recurrent state (RSSM), memory |
| Stochastic environments | Multiple futures are possible | Stochastic latents, discrete codes |
| Generalization | Transfer between environments | Foundation world models (Genie, Cosmos) |
| Computational cost | Planning is expensive at test time | Amortized policies, model distillation |
Watch how prediction error grows with each imagined step. The red band shows the uncertainty growing.
World models sit at the intersection of reinforcement learning, generative modeling, and representation learning. They're the foundation for agents that can reason about consequences before acting — a capability that separates reactive systems from truly intelligent ones.
You now understand how AI learns to dream. The ability to simulate the future in imagination — to ask "what if?" before committing to action — may be the most important capability an intelligent agent can have.