Stage 1: Meet the Cast
Concept 2 of 10
C1.2

Reward Prediction Error

Dopamine, formalized. The signal is the difference between expected and actual — not the reward itself.

The gambler at the moment of the pull — leaning forward, eyes on the reels. What keeps the lever moving is not the win. It is the unpredictability of when the win will come.

Concept C1.1 introduced the idea. This concept makes it precise.

The phrase reward prediction error is a piece of computational vocabulary borrowed from machine learning. It refers to a single quantity: the difference between the reward an animal predicted and the reward the animal actually received. If you predicted nothing and got food, the prediction error is large and positive. If you predicted food and got food, the prediction error is zero. If you predicted food and got nothing, the prediction error is negative.

In the 1990s, Wolfram Schultz placed recording electrodes in the midbrain dopamine neurons of monkeys and watched what those neurons fired in response to. Early in training, the neurons fired when a juice reward arrived. Later, once the monkeys learned that a light predicted the juice, the neurons stopped firing for the juice and started firing for the light. If the light appeared and no juice came, the neurons went quiet at exactly the moment the juice should have arrived — a dip below baseline.

That last detail is what convinced the field. The neurons were not coding for the reward itself. They were coding for the discrepancy between what was predicted and what arrived. Positive surprise: spike. Predicted outcome: silence. Negative surprise: dip. The same three states we sketched in the last concept, now grounded in a specific experimental record.

Why does this matter at the bedside? Because addiction is a disease of learned cues — and learned cues are exactly what a prediction-error signal builds. The patient who has spent a thousand evenings buying at the same corner has trained their dopamine system to fire on the sight of that corner. The corner is the light. The drug is the juice. The dopamine spike on cue encounter is the craving the patient cannot explain to themselves. It is not weakness. It is a circuit doing exactly what a prediction-error system is supposed to do.

Treatment that ignores this gets nowhere. Treatment that respects it — contingency management, cue exposure, medications that blunt the dopamine response — can actually move the patient. The framework is not academic.

Hold onto two ideas. First, the dopamine signal is a difference, not a feeling. Second, what trains the system is unpredictability — which is why intermittent reinforcement (slot machines, social media notifications, drug use after a period of abstinence) is the most addictive schedule there is. The system is built to learn from surprise. Make every reward surprising, and the system never stops learning.

Wolfram Schultz's monkeys, simplified. Three firing patterns: unexpected reward (spike), predicted reward (flat), expected reward omitted (dip). The signature of a prediction-error code, not a pleasure code.
The anchor

Dopamine is not a pleasure signal; it is a prediction-error signal — the difference between expected and actual reward.

A street corner at dusk. To anyone else, unremarkable. To the patient who has learned this place as a cue, dopamine has already fired before the conscious thought of craving has formed.
Prove it

A patient with addiction sees the corner where they used to buy and feels craving even without using. Which mechanism is firing?

This connects to

Locked concepts unlock as you reach them on the path.

Back