Why I Ended Up in the Harness
Not as a passing trend or productivity hack, but as a signal of what’s coming next for many of us, whether we’ve realized it yet or not.
What I left out, however, was the most important question: did I choose this transformation? Was it a deliberate decision? A reality I consciously created?
I wish I could say yes.
The truth is that I didn’t arrive here entirely by choice. This shift is largely the result of forces much bigger than me. It is the natural consequence of how the most powerful industries in the world are evolving. And because I’ve spent my career operating at the edge of the AI industry, being “in the harness” wasn’t a tactical choice. It became a strategic constraint.
Am I disappointed by that?
Not at all.
In fact, this is the most exciting professional transformation I’ve experienced in my life.
Today, I can do things I only dreamed about a decade ago. I can fully embrace my hyper-generalist nature. I can reconnect disciplines that were previously separated by time, expertise, and organizational boundaries. I can become the kind of business polymath that modern specialization had almost made impossible.
AI has given me leverage to think broadly again.
And that’s why this moment is both terrifying and exhilarating.
Being in the harness is infinitely more interesting than being locked inside a gaming console. The real world becomes your environment. Every industry becomes your playground. Every inefficiency becomes an opportunity.
The ability to create business arbitrage, to identify opportunities between what is possible and what is currently being done, is becoming one of the most valuable skills of our era.
For anyone working in business, the defining skill of the coming decades will be the ability to transform intelligence into outcomes.
To turn tokens into decisions.
To turn decisions into execution.
To turn execution into business results.
That is the new game.
Welcome to the new world.
The only question is:
Are you ready to embrace it?
I want to tell you how I ended up living inside a harness of agents instead of in front of a computer. To make that story land — and to make clear it wasn’t a personal preference but a forced move — I have to walk you through what happened to AI between late 2022 and now.
Not as a list of models. As a sequence of inflection points, each one breaking the rule the previous era ran on, each one forcing the next. There are four such breaks. They land roughly one a year. Together, they are the reason it is no longer possible to keep working the way I, or anyone, used to.
Read them in order, because the logic only resolves at the end. By the time you reach the fourth break, the personal arc will look like what it actually was: a delayed response to an exponential the rest of the system had already absorbed.
2022 → 2024: pre-training scaling, and the chat era it produced
In November 2022, ChatGPT became public. Within months, the rule that had been quietly running AI since 2020 became visible to everyone: make the model bigger, feed it more data, and it gets smarter.
This was pre-training scaling — the original scaling law. Dario Amodei put it plainly in 2024: from 2020 to 2023, the thing being scaled was pre-trained models trained on increasing amounts of internet text, with a tiny bit of fine-tuning on top. The recipe was simple, the gains were stunning, and the product was equally simple — chat:
One model, one turn, one human. The user prompted; the model answered. The user prompted again. Every step was driven from the outside.
The interface was a window. Operating happened by hand, in real time, at the chat surface.
What it rewarded was operating skill. Knowing how to structure a prompt, work the window, read the output, push it again. Being good at chat was a real edge. It carried a lot of careful, capable people for two years.
Underneath, though, the engine driving the gains was already running into a wall. By 2024, the conversation in papers and at NeurIPS had shifted to “diminishing returns”:
Doubling pre-training compute no longer doubled the improvement. The curve was bending, not breaking — but it was unmistakably bending.
High-quality training data had largely been exhausted. The internet had been scraped. The books had been ingested. Code had been processed. The cheap, dense tokens were gone.
The labs knew it before the public did, and started saying so by the end of 2024.
This is the structural fact everything else is downstream of. Pre-training scaling hit a wall. But the frontier didn’t stop — it never does. It had to find a new place to put compute. The era of “just make it bigger” was ending, and scaling itself had to relocate.
Hold that move in mind: when one axis flattens, the frontier moves outward to the next layer. It happens four times in this story.
September 2024: the frontier moved to test-time, and reasoning began
On September 12, 2024, OpenAI released o1 — a model designed to spend more compute thinking at the moment you used it, not at the moment it was trained. The full version shipped in December as part of the “12 Days of OpenAI” event. Within months, every major lab had reasoning models in production: OpenAI o3-mini (January 2025), o3 and o4-mini (April 2025), Anthropic’s reasoning Claude, Google’s Gemini Thinking, DeepSeek R1.
The technical detail mattered less than the structural fact. Scale had migrated. The new way to make models smarter was not to grow them, but to let them deliberate. Compute that used to be spent in training was now being spent in inference — the budget moved from before deployment to during use.
That single migration kicked off the second era, and it had three immediate consequences:
The unit of work expanded from a turn to a task. A reasoning model could chew on a problem for minutes, not seconds. The human’s role shifted from “operate every step” to “pose the problem, judge the result.”
Smaller models became serious contenders. A small model with a sufficient thinking budget could outperform a much larger model with none. The “biggest model wins” rule of pre-training quietly stopped applying.
Latency stopped being a vice. The market accepted models that took longer to answer, because the answers were structurally different — they showed work, reasoned across steps, caught their own errors mid-stream.
But the real consequence wasn’t visible until you looked one step further. A model that can think step by step can also act step by step. It can call a tool, check the result, decide what to do next, and call another tool. It can run for minutes, not seconds. It can chase a goal across many turns without a human between them.
Reasoning, in other words, was the precondition for agency. The chat era’s defining shape — human in the loop, every turn — was about to break, and o1 was the first crack.
2025: the agent era, and the protocol that made it real
The agent era didn’t arrive because models suddenly got smart enough. It arrived because two things landed in the same window:
Reasoning unlocked autonomous step-taking from the inside. A model could now run a chain of thought, evaluate intermediate results, and adapt its plan mid-task.
The Model Context Protocol gave it hands. In November 2024, Anthropic open-sourced MCP — a single standard for how an AI model connects to external tools, data, and applications. The industry called it “the USB-C of AI.”
MCP went from announcement to infrastructure faster than any AI standard before it:
March 2025 — OpenAI adopted MCP across the Agents SDK, Responses API, and ChatGPT desktop.
By late 2025 — 97 million monthly SDK downloads, 10,000+ active servers, first-class support in ChatGPT, Cursor, Gemini, Microsoft Copilot, and Visual Studio Code.
December 2025 — Anthropic, OpenAI, and Block jointly formed the Agentic AI Foundation under the Linux Foundation and donated MCP into it, with AWS, Google, Microsoft, Cloudflare, and Bloomberg as supporting members.
The plumbing had become industry infrastructure. What that combination produced was a fundamentally different thing in your computer. Not a chat companion. An agent — a system that takes a goal and runs with it, calls tools on its own, reads its own results, corrects its own errors, and comes back when the work is done.
Through 2025, every major lab shipped autonomous agent products:
OpenAI — ChatGPT Agent mode (operator-style autonomous browsing, scheduling, multi-step tasks).
Anthropic — Computer Use (the model controlling a real desktop, clicking, typing, navigating).
Microsoft, Google, Meta, Amazon — all shipping agentic equivalents through 2025.
The shape of work changed underneath my hands while I was still operating. Where the chat era was operate every turn, the new way was:
Hand a goal to the system.
Let it run.
Judge the result.
That is not the same job. It is a different job entirely. The most productive people quietly stopped operating. I had not.
This is the period — roughly mid-2025 onward — when I started getting out-produced by people working the same hours with the same tools. The gap widened week by week. I just hadn’t yet identified the layer the gap lived on.
2026: swarms, AGaaS, and the form factor leaves the desk
The shift from chat to a single agent was the visible break. The less visible one — the one happening as I write this — is from a single agent to a swarm.
Once orchestration is standardized over MCP, an agent stops being a thing you launch and becomes a unit you compose:
One agent becomes many specialists. Narrow agents — one for content, one for analytics, one for outreach, one for review — each with a single job.
They run in parallel, on independent schedules. Not a serial pipeline; a swarm.
They share a single source of intelligence. A common memory layer, a cached source of truth, gated approvals where stakes are high.
The unit of work climbed again — from a turn (chat), to a task (agent), to an operation (swarm). And the scaling axis migrated one more rung outward: not the model, not the loop, but the orchestration around them.
Capability is now grown by adding agents, loops, gates, and memory in coordination. The harness is the new scaling layer.
The business model mutates to match, because the unit of value travels with the frontier:
SaaS sold a tool and charged for the seat. A human did the work inside it. The pricing assumed human operation as the rate-limiting step.
AGaaS — agents-as-a-service — sells the work itself. You don’t buy the software and operate it. You buy the outcome an agent or a swarm delivers.
The thing being priced shifts from an instrument a person uses to a result a system produces. AWS’s leadership has stated openly that agentic AI applied to e-commerce could be “the next multibillion-dollar business for AWS.” OpenAI has projected that roughly one-third of its 2025 revenue would come from agentic AI tooling. Outcome-based pricing is moving from edge case to default.
And the form factor follows the same vector outward:
Chat needed a screen. You sat at it; the work happened where you were.
Agents tolerate a phone. The work runs in a runtime elsewhere; you steer through a thin handle.
Swarms can be steered from anywhere a goal can be uttered. Voice. Glasses. An ambient surface. The screen shrinks because the work it used to hold has moved off it.
The next surface won’t be a laptop. It will be an AI-native object — closer to an always-on personal-agent appliance than a PC — that you speak goals into and tap approvals on.
That is where we are in June 2026. Not at the end of any cycle, but visibly running the fourth scaling era after pre-training, post-training, and test-time: agency and orchestration as the place where capability is now grown.
The forcing function: one rule, four iterations
Step back from the four breaks and a single move repeats:
When the current scaling axis hits diminishing returns, the frontier doesn’t stop — it relocates outward, one layer further from the model. Each relocation creates a new unit of work, a new business model, and a new form factor, and renders the prior layer’s defining skill a commodity.
The cycle, named: new axis → new unit of work → new business model → new form factor → old layer commoditizes → repeat.
Run that against the actual record:
Pre-training (2020–2024) scaled the model.
Unit: a turn. Business: SaaS. Form: a desktop chat window. Skill rewarded: operating it well.
What broke it: data exhaustion + diminishing returns from naive scaling.
Test-time (Sept 2024 → 2025) scaled the thinking.
Unit: a task. Business: reasoning-tier subscriptions ($200/mo Pro, dedicated reasoning APIs). Form: still a screen, but the human stepped further back. Skill rewarded: framing and judgment.
What broke it: reasoning made agency feasible; the loop wanted to close.
Agency (2025) scaled the loop.
Unit: a goal. Business: outcome delivery. Form: agent + phone. Skill rewarded: setting frames the loop runs inside.
What broke it: once agents work, you want many of them, coordinated.
Orchestration / swarms (2026 →) scales the harness.
Unit: an operation. Business: AGaaS. Form: ambient surface. Skill rewarded: authorship across a swarm.
Each iteration is faster than the last:
Pre-training ran ~4 years (late 2020 → late 2024).
Test-time ran ~1 year (Sept 2024 → late 2025).
Agency ran ~1 year (2025).
Orchestration is the era we are inside right now.
The compression isn’t rhetoric or vibe; it is the visible spacing of inflection points since 2020, and it is the part that hits humans hardest. You no longer get years to learn the new layer before it commoditizes.
This is what I mean when I say the transition was forced, not chosen. I didn’t pick a workflow. I got carried by a scaling law that kept relocating capability outward — and the only way to stay near where value was being made was to move with it.
What this did to me
For about eighteen months — while the second and third eras were happening — I worked the way I always had. In the loop, turn by turn, operating well. I was good at it. I had been good at it for two years. I had built habits around it.
And I was being out-produced. At a widening rate. By people working the same hours, on the same models, with the same access.
For a long time, I misdiagnosed it:
First I told myself it was a discipline problem. I tried longer hours. Sharper focus. Better prompts. The gap kept widening.
Then I told myself it was a tooling problem. I tried better models, better wrappers, better workflows. The gap kept widening.
Eventually I realized it was a layer problem. The rung I was standing on — operating — had quietly become the part the machine does for itself. Every hour I spent operating by hand was an hour spent doing something the system now does at scale, for almost nothing.
That is the punchline of the four-year arc seen from inside one person’s career. Each scaling relocation deprecates the previous layer’s defining skill. The people who tripled their output in 2025 had not found a better tool. They had stepped one rung outward — from operating to framing, from doing the work to authoring it — while I was still working harder at the part that no longer mattered.
When the world’s software output roughly tripled while the number of people doing the work stayed flat (GitHub’s own April 2026 figures), the people producing three times as much were not typing three times faster. They had restructured how the work happens — one person directing a system, instead of one person performing the steps.
And the moment enough people did that, the market quietly reset its expectation of what a single person produces. Producing careful, excellent work at the old volume stopped being a stable position; it became a slow exit, because the baseline had moved.
So I moved. Not because the new way was exciting, but because the alternative was to keep doing careful work at a level the market had already priced as not enough.
I stopped operating and started framing.
I stopped working inside a chat window and started directing a harness from wherever I happened to be.
The frame I set in the morning steered a swarm through the day.
The work stopped running at my desk and started running in a remote runtime I held a thin handle to.
I didn’t leave the desk on purpose. The work left, and I followed it.
The bedrock underneath the four eras
A reasonable objection follows immediately, and it deserves an honest answer.
If every rung commoditizes, and the cycle keeps accelerating, what stops the rung I just moved to from sinking next? Won’t framing get automated the way operating did?
Yes — as a craft, it will. Models will draft problems, propose constraints, generate evaluation criteria. The skill of framing will get machine help, and within a year or two it will get a lot of it. So if framing-as-a-skill were what I was relying on, I would just be one rung higher on a sinking ladder, with the floor coming up faster.
But the four-era arc points at something deeper than any single skill. Every rung the frontier has climbed has been a capability — and every capability eventually commoditizes when scale relocates. Operating fell. Framing-as-craft will fall. Orchestration-as-craft will fall after that.
The thing that can never be automated, because it isn’t a capability at all, is authorship.
Wanting a particular outcome. Not the ability to produce a particular outcome — the underlying wanting of it.
Choosing which tradeoffs are acceptable. Not the analytical work of comparing them — the moment of deciding, with the weight of consequence.
Answering for the result when it ships. Not the work of producing it — the liability for what it does in the world.
The machine can take on any competence given enough capability. It cannot take on wanting, or liability. It can draft a thing; it cannot be the one who owns it.
That is the bedrock the churn can’t reach, because it isn’t a skill that can be learned faster by something faster than you. It is a question of whose will, and whose responsibility.
So the deep “why” of my transition isn’t “use agents.” It is a migration of where I stand:
From the person who performs the work
To the person who authors it
That is the only ground the next scaling relocation cannot reach, because it isn’t on the ladder. Everything else I do now — the frames, the gates, the swarm — sits on top of that one position. I stopped trying to be faster than the machine and started being the thing it answers to.
That is the why.
What it actually looks like to live there — what life in the harness feels like day to day — is the next piece.
Key Takeaways & Mental Models
The Scaling Migration — AI’s scaling laws didn’t stop; they keep relocating outward from the model. Pre-training (2020–2024) → test-time reasoning (Sept 2024 → 2025) → agency (2025) → orchestration / swarms (2026). Each relocation grows capability at a new layer.
The Forcing Cycle — Each new scaling axis creates a new unit of work, a new business model, and a new form factor, and turns the prior layer’s defining skill into a commodity. Run it four times: chat → swarms, SaaS → AGaaS, screen → ambient surface.
The Compression — Pre-training ran four years. Test-time ran about one. Agency ran about a year. Orchestration is now. The compression is the visible spacing of inflection points since 2020 — not rhetoric, geometry.
The Operating Trap — Continuing to compete on the rung the frontier just left feels like diligence. It is the fastest way to fall behind, because the system now does that rung for free, at scale, and without rest.
The Output Asymmetry — When output triples on flat headcount, the gap isn’t effort — it’s restructuring. Once the baseline resets, holding the old layer isn’t a preference; it’s a slow exit.
The Accountability Floor — Every rung on the ladder is a capability, and every capability eventually commoditizes. The one position that doesn’t is authorship — wanting, choosing, answering for outcomes. It’s not on the ladder, which is exactly why the churn can’t reach it.
With massive ♥️ Gennaro Cuofano, The Business Engineer












