The Agentic Harness War

Jul 01, 2026

Two frontier labs published usage data within weeks of each other. OpenAI’s The Shift to Agentic AI: Evidence from Codex and Anthropic’s latest Economic Index read like different projects — one obsessed with delegated workflows, the other with the economic footprint of artifacts. But place them on the same table and they collapse into a single finding, arrived at from opposite directions:

The surface, not the model, governs autonomy. The agentic surface — the harness — is becoming where value pools in the entire AI stack. And we can now read, from real usage data, exactly what stage of that transition we are in.

Most commentary misses this. These reports aren’t just describing two products. Read together, they are a calibrated instrument for measuring how far agentic adoption has actually progressed — because the three populations in the Codex data (individuals, organizations, and OpenAI’s own workforce) are not three customer segments. They are three points on a single timeline, separated only by how much adoption friction has been removed. OpenAI’s workforce is the future running early. Organizations are the present. Individuals are the recent past, still visible in the rearview.

This piece does four things: establishes the stage of adoption the data actually reveals, reads the convergence both reports refuse to state out loud, contrasts the two opposing business models built on top of it, and cascades the consequence through the Map of AI to show where value migrates next.

Subscribe Now at 75% Discount, Expiring!

Follow me also on The AI Supercycle!

The Conversation Was a Transitional Form

Subscribe Now at 75% Discount, Expiring!

Follow me also on The AI Supercycle!

The deepest tell in both reports is the same, and neither lab frames it as the headline: their own measurement apparatus broke.

Anthropic says it plainly: “a chat transcript no longer fully captures how people are using AI.” They moved to higher-frequency sampling and an output classifier. When the interface stops being the right unit of measurement, it’s because AI stopped being something you consult and became something you deploy.
OpenAI says the same thing structurally: the unit of analysis is no longer the conversation but the delegated workflow — one instruction that triggers an agent to inspect files, run commands, call tools, and produce artifacts autonomously. They warn outright that active users, chats, and message volume “may become less informative as agentic systems diffuse.”

Two independent measurement systems, both built around the conversation, both breaking the same way at the same time. That is not coincidence — it’s a phase change registering on two different instruments.

And both reports show the same proof that the surface, not the model, is what changed:

OpenAI’s proof is internal cannibalization. Among OpenAI’s own workforce, Codex accounts for 99.8% of the work-related output the company’s employees generate across Codex and ChatGPT combined — meaning the conversational interface didn’t merely decline inside OpenAI, it was almost entirely replaced as the way work gets done.
Anthropic’s proof is an autonomy gap that holds even when the model is identical. Across 26 of 31 output types, Claude Code runs at higher autonomy than chat or Cowork — and that gap persists when you hold the underlying model constant, which is the whole point: it isn’t the model behaving differently, it’s the surface permitting different behavior. The starkest illustration: the same blog-post task that takes 13 rounds of back-and-forth in chat collapses to a single prompt in Claude Code. Same model, same task, wildly different delegation. The only variable is the surface.

The abstraction: the conversational interface was scaffolding. The harness is the building. Everything downstream is a consequence of value migrating from the thing you talk to toward the thing you delegate to.

Reading the Three-Population Clock

Subscribe Now at 75% Discount, Expiring!

Follow me also on The AI Supercycle!

Here is the analytical engine the rest of the piece runs on. OpenAI’s three populations form a natural time-lapse of agentic diffusion, and the spread between them tells you precisely how early we are.

The gradient, measured two ways

Look first at how many users touch the agentic surface at all (active users in the trailing 28 days):

OpenAI workforce: ~98% — almost everyone.
Organizations: 17.3% — fewer than one in five employees.
Individuals: under 1% (0.7%) — effectively nobody.

Now look at how much of total work output flows through the agentic surface rather than the chat interface (share of output tokens, the rough unit of AI-generated work):

OpenAI workforce: 99.8%
Organizations: 63.3%
Individuals: 16.5%

The two measures disagree violently, and that disagreement is the most important signal in either report. Among individuals, 16.5% of all output already runs through the agentic surface even though under 1% of users have adopted it — meaning the tiny minority who adopt use it so intensely they dominate output. The same logic, amplified, plays out in organizations: 17.3% of users generate 63.3% of output through the agentic surface. Adoption is not a thin film spreading evenly across many users; it’s a deep pool concentrating among a few intensive adopters.

Three stage-readings fall out:

Stage zero for the mass market. With fewer than one in a hundred individual users on the agentic surface, the conversational interface still overwhelmingly owns the consumer base. Watch only individuals and you’d conclude agentic AI barely exists.
Stage one — early majority — inside organizations. At 17.3% of users but 63.3% of output, organizations have already crossed the line where the majority of business work output flows through the agentic surface, even though most employees haven’t touched it. This is the market’s central misread: adoption looks early by seats and nearly complete by output.
Stage three — near-saturation — already visible at OpenAI. Not a forecast but a shipped reality. OpenAI’s workforce isn’t representative — it’s a preview of the destination once cost, buy-in, training, and tooling friction approach zero (inside OpenAI, AI usage is free at the margin and knowledge-sharing is constant).

The intensity trap — why “% of seats” is the wrong clock

The divergence compounds when you zoom into a single job function. Among engineers in organizations:

The agentic surface is just 26.8% of the average engineer’s output — and that figure alone makes adoption look early-stage and tentative.
But it is 88.3% of the total output engineers generate as a group — because the heavy adopters route nearly everything through it, and they dominate the function’s total work.

The same gap appears in legal, a function nobody thinks of as agentic: 1.9% of the average legal user’s output, but 17.6% of the legal function’s total output. Read one number and legal is barely touched; read the other and a sixth of its work output already runs through the agentic surface.

The stage of adoption looks completely different depending on which clock you read. By seats: early. By output: well underway. Anyone reporting “percent of employees active” will conclude the transition is years away when, in output terms, it’s already half-done.

The convergence signal — the laggards move fastest

The clearest evidence of which way time runs is OpenAI’s internal time-lapse. In late 2025, OpenAI’s own usage looked like external organizations look today — engineers leading, everyone else near zero. Then:

Engineers crossed 50% of their usage onto the agentic surface by January 2026 and passed 90% by March.
Legal and recruiting sat near zero in January, climbed gradually to about 20% by early April — then jumped from roughly 20% to 75% in a single month.

The technical functions blaze a slow trail; the non-technical functions, once that trail exists, sprint down it. This slow-pioneer / fast-follower signature is exactly what you’d expect if the binding constraint is organizational learning rather than model capability — and it means the laggard functions in every enterprise aren’t permanently behind. They’re pre-inflection. The convergence inside OpenAI coincided with a deliberate internal push of training and feedback loops, which is the tell: the complement that unlocked the laggards was organizational effort, not a new model.

What this fixes the stage at

Synthesizing: we are at early majority for organizations and stage zero for individuals — but with output already concentrated and a convergence dynamic that compresses the remaining timeline. The friction is organizational, not technological: if it were capability, every population running the same models would look the same. They look wildly different. The constraint is access to systems, management expectations, workforce skill, and review processes — the complements, not the engine. That single fact is what makes the rest of the transition predictable.

Two Labs, Opposite Geometries, Same Insight

Subscribe Now at 75% Discount, Expiring!

Follow me also on The AI Supercycle!

Both labs believe the harness is the future. They are building opposite go-to-market geometries on top of that shared belief — and the contrast is the strategic content.

OpenAI / Codex — The Consolidation Play

OpenAI is betting on one harness that absorbs everything.

Codex is no longer pitched as a coding tool but as an “agentic coding and work platform.” The product milestones the report annotates — macOS launch, Windows launch, and tellingly a “Knowledge Work App Update” — trace a deliberate march out of the terminal and into general knowledge work.
The systematization engine is the skill-and-plugin ecosystem: a SKILL.md workflow spec, packaged into plugin manifests that bundle app integrations and tool configuration. Skill use among weekly active users climbed from 5.4% to 26.6% in roughly three months — a near-fivefold rise that shows workflows being codified into reusable infrastructure rather than re-typed each time — and inside OpenAI it’s effectively universal at 96.2%.
OpenAI ran the experiment on itself and shipped the result: it let Codex consume ChatGPT to that 99.8% work-token dominance. They are not theorizing that the conversational surface dies — they have already killed it internally and published the autopsy.

The bet: pull every workflow into one agent, own the harness, own the skill marketplace on top of it, and the conversational layer becomes a feature, not a product.

Anthropic / Claude Code + Cowork + Embedded Agents — The Proliferation Play

Anthropic is betting on many harnesses, embedded where work already lives.

Rather than one super-surface, Anthropic spreads across surfaces at different autonomy levels: Claude Code (high-delegation technical), Cowork (multi-step knowledge work), and embedded agents inside the tools work already happens in — Excel, PowerPoint, Chrome, Design.
The Economic Index keeps the augmentation framing: in high-value work, Claude produces more per turn (1.34× the output) and users engage more (1.53× the turns) — the two rise together, which is why Anthropic argues “more production from Claude does not mean less from the user.” It’s a deliberately less aggressive posture than OpenAI’s “let it replace the chat.”
But Anthropic flags the expiry date on its own comfort: that augmentation equilibrium is partly an artifact of which surface people happen to use today. Claude Code is named as the leading indicator — the surface where the future autonomy curve is already visible. The human loop stays open mostly where the tooling hasn’t yet closed it.

The bet: embed an agent into every existing workflow rather than dissolve every workflow into one agent; meet the app where it is.

The Contested Question

Strip both strategies to the bone and they reduce to one fork:

Does the harness consolidate into one super-surface (OpenAI’s bet) or proliferate into many context-specific surfaces (Anthropic’s bet)?

Note the asymmetry, and how it maps onto the stage. OpenAI is reporting from stage three — it already ran consolidation on its own near-saturated workforce. Anthropic is reporting from the inflection point — augmentation holding today, with a pointed finger at where it breaks. One lab is telling you what happened after the transition; the other is telling you where to watch as it happens.

Cascading Through the Map of AI

Subscribe Now at 75% Discount, Expiring!

Follow me also on The AI Supercycle!

The harness shift doesn’t stay in one place. It propagates through the stack, and each layer reprices. Reading bottom to top — and notice how the stage of adoption determines how far each repricing has gone:

Compute / silicon — the unconditional winner. Agentic use is a demand shock, not a usage tweak. Inside OpenAI, the median worker’s output rose between 10× and 56× across job functions in seven months (Research 56×, Customer Support 32×, Engineering 27×, Legal 13× — the largest multiples landing on the functions that adopted latest, which is the convergence story again). Anthropic shows the same force from the cost side: tokens track wages, with about 44% of the wage-to-token relationship explained by the fact that higher-paid work produces more compute-intensive artifacts. Delegation doesn’t consume slightly more compute; it consumes an order of magnitude more — and because output concentrates before users do, the demand shock arrives ahead of the seat-adoption curve. This layer wins regardless of who wins the harness war above it.
Model — commoditizing relative to what sits above it. Both reports independently conclude the surface matters more than the model. Anthropic states it directly (”product matters more than model”); OpenAI demonstrates it by showing the same model deliver radically different delegation depending on the harness. This is the AGaaS thesis showing up as physics — the model stays necessary, but value capture migrates upward. The engine becomes essential and unglamorous; nobody pays a premium for the engine once the vehicle around it determines the drive.
Harness / orchestration — the NEW value layer, and the battleground. This layer barely existed in the old stack. Now it’s where autonomy is decided, where work gets systematized, and where the moat forms. The SKILL.md is becoming the modular unit of delegated work — the line of code of the agentic era. And the stage reads cleanly here too: skill adoption at 26.6% externally versus 96.2% inside OpenAI is the harness layer’s own adoption clock, running the very same friction gradient as every other measure.
Application — collapsing INTO the harness. This is the AGaaS mutation made literal. Why open a SaaS app when the agent runs the workflow end to end? Codex absorbing documents, spreadsheets, and slide decks as plugins is applications dissolving into skills. The app stops being a destination and becomes a capability the harness invokes. Anthropic’s embedded-agent bet (Claude in Excel, in PowerPoint) is the explicit hedge — keep the app, put the agent inside it.
Judgment / oversight — where the human relocates. Both reports converge hardest here. As execution delegates, human effort moves to delegation, supervision, review, and integration, and the returns to domain expertise rise. The clearest prototype: 28.6% of OpenAI users run five or more agents at the same time at some point each week, and the most intensive 1% log around 71 hours of agent runtime inside a single day — which is only possible by running many agents in parallel around the clock. That is no longer a person using a tool; it’s a human managing a team of agents. This is the Judgment Layer, confirmed empirically by two independent datasets — and because it’s barely developed anywhere except the frontier, it’s the layer with the most adoption runway still ahead of it.

The reading rule that falls out:

Value migrates model → harness → skill marketplace. Pricing power leaves the model and lands on whoever controls the surface and its reusable workflows. Build at the harness; meter at the compute layer; defend at the skill layer. And read the stage by output, never by seats.

The Skill Standard and the Hidden Dependency

Subscribe Now at 75% Discount, Expiring!

Follow me also on The AI Supercycle!

Three forces decide the next 18 months.

The skill becomes the contested standard

If the SKILL.md is the line of code of the agentic era, then the format war is the platform war.

OpenAI’s SKILL.md-plus-plugin format is a land-grab for the standard — own the authoring format, own the ecosystem.
A cross-harness, portable skill format is the open-web counter-move — skills that run on Codex, Claude Code, and any future surface, owned by no single lab.
The load-bearing question: is the skill portable, or is it locked to a harness? This is the most leverage-rich fork in the space, and it’s being decided now, in authoring-format choices that look like documentation but are actually strategy.

The cross-layer dependency nobody is pricing

Here is the argument neither report can see, because it lives between the layers.

The harness’s value depends entirely on diversity preserved in the model beneath it. The cheap training default now sweeping the field — self-distillation, with its “likely-gets-likelier” dynamic — quietly collapses a model’s ability to find multiple solution paths. It optimizes the visible number (average accuracy) while eroding the invisible capability (the ability to route around a blocked path). It’s economic concentration as a training artifact: rich-get-richer at the level of model weights, the Barbelled Distribution reasserting itself inside the model.

Now connect it to the autonomy trend both labs document:

With a human in the loop every turn, a brittle one-path model gets corrected — the dead end is caught and rerouted by the person.
When you hand the full multi-step trajectory to an agent in a single prompt — exactly what Claude Code and Codex make possible — brittleness becomes unrecoverable, because there’s no human turn in which to catch the blocked route.

The conclusion is uncomfortable and original:

The consolidation bet — one harness, full delegation — carries a hidden dependency on a training choice happening two layers down. The Judgment Layer can’t save you if the model beneath it can only find one path. Diversity collapse bites hardest exactly where autonomy is highest. The cheap training method and the agentic deployment trend are on a collision course — and the fix (diversity-aware training) exists but isn’t yet the cheap default.

The barbell resolution

So which bet wins — consolidation or proliferation? The honest answer, consistent with everything above, is that the harness barbells.

Consolidation wins where work is digital, modular, and verifiable — software first, which is precisely the leading edge in both reports. One super-harness absorbing the full software life cycle is already happening inside OpenAI.
Proliferation wins where work is embedded in legacy tools and human review is non-negotiable — regulated industries, relationship-driven roles, judgment-heavy domains. Here the embedded-agent strategy beats the dissolve-everything strategy.

The harness doesn’t pick one winner. It splits — the barbell distribution reasserting itself one layer up the stack. The two labs aren’t wrong about each other; each is right about a different end of the same barbell.

Key Takeaways & Mental Models

Subscribe Now at 75% Discount, Expiring!

Follow me also on The AI Supercycle!

The conversation was scaffolding; the harness is the building. Both labs’ measurement systems broke the same way at the same time — the surest sign of a phase change. AI moved from something you consult to something you deploy.
Read the stage by output, never by seats. Organizations sit at 17.3% of users but 63.3% of output; engineers at 26.8% of the average user’s output but 88.3% of the function’s total. Adoption looks early by headcount and nearly complete by work produced — and the second number is the true one.
The constraint is organizational, not technological. Same models, three wildly different adoption levels across individuals, organizations, and OpenAI. The binding inputs are access, management expectations, skill, and review processes — the complements, not the engine. That’s why the transition is predictable, and why laggard functions are pre-inflection, not permanently behind.
Surface beats model. Proven twice, independently: Codex eating ChatGPT to 99.8% of internal work tokens; the autonomy gap surviving with the model held constant (13 chat turns → 1 Claude Code prompt). Value capture migrates up from the model to the harness.
The SKILL.md is the line of code of the agentic era. Systematization — codified, reusable workflows — is where the moat forms. Skill adoption went 5.4% → 26.6% in a quarter; the authoring format that wins becomes the platform.
Two business models, one fork: consolidate or proliferate. OpenAI pulls every workflow into one harness; Anthropic embeds a harness into every workflow. The layer barbells — consolidation for digital/verifiable work, proliferation for embedded/judgment-heavy work.
The Judgment Layer is now empirical, not theoretical. As execution delegates, humans relocate to delegation, oversight, and integration; returns to domain expertise rise. The 28.6% running five or more concurrent agents are the prototype of the future worker — and it’s the layer with the most runway left.
The hidden cross-layer trap: maximum autonomy on top of one-path models is a collision, not a synergy. The harness’s value is mortgaged to a diversity choice two layers down — currently being written in the cheapest, most brittle way available.

The frontier of AI is no longer about asking systems for answers. It’s about managing systems that act — and the war for who owns that management layer, and on what terms, is the only fight that matters now.

Recap: In This Issue!

The Conversation Was a Transitional Interface

The chatbot was never the final destination for AI adoption.
Both OpenAI and Anthropic show that the interface is shifting from conversation to delegated execution.
The unit of interaction is no longer a prompt, but an autonomous workflow.
AI is evolving from a tool users consult into a system users deploy.

We Are Measuring AI Adoption Incorrectly

User counts and message volume no longer capture how AI is being used.
The meaningful metric is work completed rather than conversations initiated.
A small percentage of users already accounts for the majority of agentic output.
Adoption spreads through intensity before it spreads through ubiquity.

Organizations, Not Models, Are the Bottleneck

The same frontier models produce vastly different adoption curves across different populations.
The limiting factors are organizational readiness, governance, workflow redesign, training, and trust.
Technology is no longer the primary constraint to adoption.
The next wave of AI diffusion depends more on organizational transformation than on model improvements.

The Harness Has Become the New Value Layer

The interface now determines how much autonomy AI can achieve.
Holding the model constant, agentic environments consistently outperform conversational ones.
Orchestration, memory, tool use, and workflow execution have become the new sources of competitive advantage.
Value is migrating upward from the model toward the harness.

OpenAI and Anthropic Are Converging on the Same Future

OpenAI is building a universal harness that absorbs every workflow into a single operating environment.
Anthropic is embedding autonomous agents into existing software and enterprise workflows.
One strategy emphasizes consolidation, the other proliferation.
Both ultimately point toward the same destination: work executed by autonomous agents.

The Entire AI Stack Is Being Repriced

Compute becomes increasingly valuable as autonomous agents consume significantly more inference.
Foundation models become relatively commoditized as orchestration layers capture differentiation.
Applications evolve from standalone destinations into capabilities invoked by agents.
Human value shifts away from execution toward supervision and decision-making.

Skills Become the Software of the Agentic Era

Reusable workflows are emerging as the fundamental primitive of autonomous work.
Standards such as SKILL.md represent the beginning of a new application layer.
The platform controlling skill creation and distribution gains long-term strategic leverage.
Future ecosystems may compete less on models than on reusable workflows.

The Judgment Layer Becomes Human Work

Humans increasingly define objectives instead of executing individual tasks.
Expertise shifts toward reviewing outputs, coordinating multiple agents, and making strategic decisions.
The future knowledge worker manages autonomous systems rather than operating software directly.
Judgment becomes the highest-value layer of work.

Agentic AI Has a Hidden Dependency

Highly autonomous systems require models capable of exploring multiple reasoning paths.
Training methods optimized for efficiency risk reducing reasoning diversity.
Greater autonomy amplifies the consequences of brittle reasoning.
The future of agentic AI depends not only on better interfaces, but also on preserving model diversity.

The Harness Will Follow a Barbell Distribution

Universal harnesses are likely to dominate highly digital, modular, and verifiable work.
Embedded agents will remain preferable in regulated, relationship-driven, and judgment-intensive environments.
The future is unlikely to converge around a single interface.
Consolidation and proliferation will coexist across different categories of work.

The Industry Is Entering Its Next Phase

The market continues to debate chatbots while the real transformation happens one layer above.
The next competitive battle is no longer about building the smartest model.
It is about controlling the operating layer through which autonomous work is delegated, orchestrated, and governed.
The AI industry is approaching the final organizational bottleneck before mass-scale agentic adoption.

With massive ♥️ Gennaro Cuofano, The Business Engineer

The Business Engineer

The Agentic Harness War

The Conversation Was a Transitional Form

Reading the Three-Population Clock

The gradient, measured two ways

The intensity trap — why “% of seats” is the wrong clock

The convergence signal — the laggards move fastest

What this fixes the stage at

Two Labs, Opposite Geometries, Same Insight

OpenAI / Codex — The Consolidation Play

Anthropic / Claude Code + Cowork + Embedded Agents — The Proliferation Play

The Contested Question

Cascading Through the Map of AI

The Skill Standard and the Hidden Dependency

The skill becomes the contested standard

The cross-layer dependency nobody is pricing

The barbell resolution

Key Takeaways & Mental Models

Recap: In This Issue!

The Conversation Was a Transitional Interface

We Are Measuring AI Adoption Incorrectly

Organizations, Not Models, Are the Bottleneck

The Harness Has Become the New Value Layer

OpenAI and Anthropic Are Converging on the Same Future

The Entire AI Stack Is Being Repriced

Skills Become the Software of the Agentic Era

The Judgment Layer Becomes Human Work

Agentic AI Has a Hidden Dependency

The Harness Will Follow a Barbell Distribution

The Industry Is Entering Its Next Phase

Ready for more?