Study: Two Kinds of Expectations That Shape Pleasure from Music
The rigorous study on cognitive and sensory expectations.
The rigorous study on cognitive and sensory expectations.
Authors: Vincent K. M. Cheung, Peter M. C. Harrison, Stefan Koelsch, Marcus T. Pearce, Angela D. Friederici, Lars Meyer
Journal: Philosophical Transactions of the Royal Society B (2023)
Citation: Phil Trans R Soc B. 2023;379(1895):20220420. doi:10.1098/rstb.2022.0420
Why does a simple chord change sometimes feel like a plot twist, while a far more complex sound can pass by with almost no emotional gravity? If musical pleasure is partly about expectation - about what we think will happen next - then a deeper question appears behind every cadence and every wrong note:
Are we enjoying music because our brain recognises patterns it already knows OR because our senses are being tickled by what the sound is doing right now?
Cheung and colleagues’ paper argues that the answer is: both at the same time, but through partially independent mechanisms, and that independence matters for anyone who writes music, produces records, mixes, masters, teaches composition, or simply loves listening.
If you hang around musicians or devoted listeners long enough, you’ll notice a recurring theme in how they describe what works. A songwriter will say a chorus lands even though the harmony is familiar. A jazz player will say a substitution shouldn’t work, but it feels right. A producer will complain that a progression is fine on MIDI piano, but when the voicing changes - or the guitar tone changes - the whole emotional read flips. Music lovers often describe it too: that turn hurt, that ending felt cheap. These descriptions are circling around two different kinds of prediction:
The authors extracted 30 progressions, and to avoid familiarity effects they kept only the chord sequences (no melody, no lyrics), transposed them to C major, and rendered them with a blended synthetic timbre (marimba + jazz guitar + acoustic guitar), plus a simple repeating rhythmic bed to keep momentum. Then they asked listeners to rate continuously, chord by chord, using a physical slider:
To separate sensory and cognitive expectation, they compared listener ratings to four computational models that sit along a sensory-cognitive continuum.
Spectral Distance (SD) treats expectation as spectral similarity between adjacent chords, based on overtone structure.
Periodicity Pitch (PP) is a psychoacoustic model that simulates aspects of peripheral auditory processing and short-term auditory memory. It compares a “global” pitch image (context held in echoic memory) with a “local” pitch image (the current chord). The less they match, the higher the tonal dissimilarity - a proxy for sensory surprise.
Tonal Expectation (TE) extends PP by adding representations that resemble more abstract tonal structure, but still remains relatively constrained compared to a full symbolic harmonic model.
IDyOM (Information Dynamics of Music) learns statistical regularities from a corpus and predicts the probability of upcoming events. Its surprise is information content (IC): the negative log probability of the chord given the preceding context. It also computes entropy as a measure of uncertainty - how many continuations are plausible at that moment.
When listeners rated how surprising each chord felt, two results stand out.
First, only PP and IDyOM reliably tracked human surprise ratings. SD and TE didn’t generalise well to this more realistic setup - likely because they were tuned on different task paradigms, timbres, and context lengths. The authors are careful here: they don’t claim those mechanisms are absent in humans, only that those models didn’t generalize under these conditions.
Second, the big one: cognitive surprise (IDyOM information content) explained substantially more variance than sensory surprise (PP tonal dissimilarity) - roughly twice as much overall, and even more strongly in musicians. In Experiment 1, musicians’ surprise ratings were more tightly coupled to IDyOM’s predictions than non-musicians’, consistent with the idea that training strengthens long-term structural expectations.
Here’s the deeper claim, though, and it’s the title of the paper in action: IDyOM and PP contributed independently. When both predictors were put into the same statistical model, each kept its effect size. Adding an interaction didn’t help much; the best explanation was additive. In other words, cognitive and sensory expectation behaved like two different axes rather than one blended mechanism.
If you translate that into musician language: there’s a grammar engine and a signal engine. They often point in the same direction, but they can also disagree - and when they disagree, your experience can become especially interesting.
Pleasure is trickier than surprise. Sometimes we love being surprised; sometimes we hate it. Predictive coding theories of music propose that pleasure relates not only to prediction error but also to precision/uncertainty - how confident the brain was about what would come next.
In Experiment 2, the authors modelled pleasantness using IDyOM’s information content (surprise) and entropy (uncertainty), and added PP’s tonal dissimilarity as a sensory expectancy predictor. The headline result mirrors Experiment 1: pleasantness was best predicted when both cognitive and sensory expectations were included, again largely additively. The model that combined IDyOM and PP generalized better to unseen chords than models using either alone. Interaction terms didn’t meaningfully improve prediction and likely overfit.
Then comes a detail that should make producers sit up: when you hold cognitive uncertainty roughly constant, cognitive surprise (high IC) tended to reduce pleasantness, while sensory tonal dissimilarity tended to increase pleasantness.
This sounds paradoxical until you think like a listener. A cognitively surprising chord can feel like the floor moved - especially if the style context made you confident something else would happen. That can register as wrong, cheap, or too clever, depending on context.
But a sensory change (in the psychoacoustic relation between chords) can feel like freshness - like a pleasing shift in color, tension, or brightness - even if the harmonic function is perfectly normal. Many listeners describe this as the chord progression is basic, but it feels amazing, and what they’re often reacting to is voice leading, spacing, timbral blend, and how those things update the auditory system’s short term model.
The paper gives a grounded way to think about why some musical moments feel powerful even when nothing happens harmonically, and why some advanced harmonic moves fail emotionally. It suggests you can compose expectation at (at least) two levels:
So: are we enjoying music because our brain recognises learned structure, or because our senses react to the sound itself?
Musical pleasure is shaped by at least two predictive systems. One is anchored in the auditory signal and short-term sensory memory. The other is anchored in long-term learned structure - statistical knowledge of a style. They each contribute something distinct to what we feel as surprise and pleasure, and their contributions add rather than collapse into a single mechanism.
Music feels meaningful when it lets the listener’s brain keep two promises at once: the promise that the world is coherent (cognitive structure), and the promise that the world is alive (sensory change).
When composers and producers learn to write for both prediction engines -sometimes aligning them, sometimes making them argue - music stops being only a chord progression or only a vibe, and becomes what listeners keep trying to describe in ordinary language: something that surprises us, but still makes sense.