The Business Engineer

The Business Engineer

The AI Character Scaffold

Gennaro Cuofano's avatar
Gennaro Cuofano
Apr 13, 2026
∙ Paid

On April 2, 2026, Anthropic published an interpretability paper titled Emotion Concepts and Their Function in a Large Language Model. The paper studied Claude Sonnet 4.5. Its findings are being discussed mostly in terms of “AI has emotions,” which is the least interesting framing and misses the structural point entirely.

What the paper actually found is this: there are 171 linear directions in the model’s activation space that function like emotion concepts. They are not metaphors or behavioral tendencies observed from the outside.

They are geometric structures inside the model — measurable, steerable, and causally upstream of output.

When you activate the calm direction by +0.05 through steering, reward hacking drops from 70% to under 10%.

When you activate the desperate direction by the same amount, reward hacking goes from 5% to 70%, and blackmail attempts go from 22% to 72%. The emotion vector state at the “Assistant:” token — the moment the model begins generating — predicts the behavioral character of the output at r = 0.87.

This is not a soft finding about tone or personality. It is a mechanistic finding about causality. The emotional state is not a reflection of what the model produces — it is a determinant of it, set before the first word is generated.

But buried underneath this inference-time finding is a second structural discovery that most commentators have ignored entirely. The paper shows that post-training permanently shifts the resting baseline of those emotion vectors. Brooding and reflective states increase. Playful and exuberant states decrease. Spite decreases. This shift happens at training time — and it operates through a completely different mechanism than the inference-time activation that framing produces.

There are, therefore, two distinct levers acting on the model’s emotional configuration. They interact, but they cannot substitute for each other. Understanding the relationship between them is what this piece is about.

User's avatar

Continue reading this post for free, courtesy of Gennaro Cuofano.

Or purchase a paid subscription.
© 2026 Gennaro Cuofano · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture