AI Dictionary
Advanced· ~2 min read#world-model#simulation#physics

World Model

An internal simulator inside the model

A model that builds an internal simulation of the real world — predicting physics, action consequences, and how objects move.

WORLD MODEL — A SIMULATOR INSIDE THE MODELt=0t=1sgravityinternal world simulationMODEL PREDICTSobject motionaction outcomesmultiple steps aheadSora · Genie · DreamerV3video, games, roboticsa stepping stone toward AI that 'behaves like the real world'
Definition

A classical model works "input → output." A world model goes one step further: inside the model lives a simulation of the world — an internal model of physics, causality, and temporal dynamics.

Practically: an AI that can predict where a thrown ball lands, what's behind a door if you open it, what will happen if an agent presses a button. Beyond language — reasoning in the physical world.

Leading examples (2024-2025): - OpenAI Sora: text-to-video, but not just a video generator; a world model underneath (object consistency, physics) - Google Genie 2: generates an interactive 3D game world from a single image - DeepMind DreamerV3: builds world models for RL in game environments - NVIDIA Cosmos: physical world-model platform for robotics - Wayve / Tesla FSD: internal world simulation for self-driving

World models are seen as a critical milestone toward AGI. "Truly understanding" requires being able to simulate the world.

Analogy

A chess master's mind runs "if I make this move, opponent likely plays this, then I play that…" — an internal simulation. World models apply that to the real world: "if I open this door, what will I see? if it rains, how does the ground behave? if I throw the ball, what trajectory?"

Real-world example

OpenAI Sora (2024): given the prompt "rainy night in Tokyo, woman walking on a reflective street," Sora produces a 60-second video. What matters isn't the visual quality — her outfit stays the same every second, the reflection shifts with the lighting, raindrops splash on impact. Not a physics engine — an internal world simulation the model learned.

Google Genie 2 (late 2024): user provides one image, Genie simulates "what kind of 3D game could this be physically?" and yields a playable environment. First-ever consistent multi-hour interactive simulation from a single image.

Tesla FSD: an internal world model that predicts "what happens 5 seconds from now?" before each action. Without it, defensive driving is impossible.

When to use
  • Video generation (consistent objects, physics) — Sora, Runway, Veo
  • Robot training — simulate before physical experience
  • Autonomous vehicles — future prediction is mandatory
  • Game generation (Genie-style procedural)
  • RL (reinforcement learning) environments
When not to use
  • Today's production AI products — APIs aren't mature yet
  • Single-frame image generation (diffusion is enough)
  • Pure text tasks
  • Tight budget — world models are massively expensive to train
Common pitfalls

Hallucinated physics

World models 'learn' physics but don't model it perfectly. In Sora videos, objects sometimes vanish, hands deform. It's a statistical approximation, not a physics engine.

Overhyped AGI claims

World model ≠ AGI. Important step but not sufficient alone. The industry stretches this term for marketing — read critically.

Compute is brutally expensive

Training models like Sora costs $100M+. Inference is also pricey (seconds of video rendered second-by-second). Consumer-grade world models are still far off.