Every regulated industry has written down what great looks like. The hard part was always running it consistently.

Every regulated industry has spent decades writing down what great looks like. Standard operating procedures. Policies. Checklists. Best practices. The written standard was never in short supply, and it was never really the problem.
Running it was. Running it consistently, at volume, on the thousandth item as carefully as on the first.
For most of that history, the standard and the work lived in different places. The standard sat in a binder; the work sat in a queue. Paper does not review anything, so whether any given item actually met the bar depended on who was checking, how loaded they were, and what time of day it was. The moment a rule or a reality changed, the written standard quietly fell behind.
That gap is now closing, and the reason is a shift in how regulated software gets built and run. We call it agentic engineering: the people who own a domain directing AI agents that perform the knowledge work itself, with clear references, citations, and handoffs. When that happens, codified “great” stops being a document and starts being the system’s operating behavior. Here is what that looked like on one real regulated-domain platform we built.
“What great looks like” stops being a document
The aspiration was always there. Decades of written excellence describe the bar that every review and every approval is supposed to meet. What was missing was a way to make that bar hold automatically, every time, without depending on the attention of whoever happened to be on shift.
In the new realm, the standard becomes executable. It is grounded in real data and defined rules, so the thousandth item gets the same scrutiny as the first. Work that used to take days, even weeks, now completes in under an hour, to the same written standard, every time. The aspiration and the operation are finally the same thing.
Built for knowledge work, running knowledge work
The most striking part of this build is the symmetry at its core.
Building the platform was itself knowledge work. Workflows were walked through, standards were interpreted, exceptions were reasoned out, and expertise was encoded into a working system. That process consumed more than 1.5 billion tokens, turning standard operating procedures, policies, checklists, and best practices into a standardized workflow that completes the regulated domain’s knowledge tasks. Understanding the work and building the system happened in the same conversation.
In production, the platform now does knowledge work at scale. It has already processed more than 500 million tokens across the tasks a regulated team runs every day: document review, compliance checks, data extraction, cross-referencing, summarization, classification and triage, risk flagging, exception handling, quality and completeness checks, drafting and reporting, and decision support. That is over a decade of full-time human reading, processed in the platform’s first two months, with every output grounded in real data and defined rules.

None of those tokens are a vanity metric. Each one stands for a unit of reasoning that used to require a person: a document read, a rule applied, an exception escalated, a draft produced. Counting them is simply a way of measuring how much knowledge work the system now carries that people no longer have to.
The point is the symmetry. Intelligence is now both the tool and the throughput: the system that does knowledge work was itself built through knowledge work. And when thorough checking stops being scarce, reviews get deeper and approvals stop queuing.
Shipped, not promised
Systems like this used to be multi-year programs. We took this one from blank page to production in four months: one month to blueprint the workflows with the people who run them, three months to build.
The receipts from the build are concrete. Eight modules in production. More than 200 APIs. More than 50 unique pages, each shaped to the task in front of the worker. Decision rights, controls, and audit trails were designed in from the start, not bolted on afterward.

In a regulated setting, that ordering matters. When decision rights and audit trails are part of the blueprint, compliance becomes a property of how the system works rather than a layer added under deadline pressure. It is far cheaper to design for scrutiny than to retrofit for it.
But the build statistics are not the headline. What happened after launch is. In its first two months live, the platform shipped nine post-launch releases and more than sixty enhancements, with user feedback reaching production in days rather than change-request quarters. Most enterprise software slows down after go-live. This one shipped faster after launch than before it. Velocity that holds after go-live is the real proof that the method works.
The hard part: production AI for regulated knowledge work
None of this is easy, and it would be dishonest to pretend otherwise. Demos are easy. Production is hard, and regulated knowledge work is the hardest altitude of all. Most AI initiatives never make that climb, and capability is rarely the reason. The conditions that break systems only appear at altitude: real volumes, real exceptions, and real scrutiny, where every output is examinable and every miss is expensive.
It is worth being clear about why this is the hard part and not the easy one. The models themselves are commoditizing quickly, and broadly similar capability is now available to everyone. What cannot be downloaded is knowing where a specific regulated workflow bends under real conditions, and having an engineering loop fast enough to fix it the same day the problem appears. That knowledge lives in the domain, not in the model.
Production becomes possible when three things work as one instead of in sequence:
- Domain knows where workflows truly bend: the exceptions, the judgment calls, and the audit moments that no generic model has seen.
- Technology makes the written standard executable and keeps every output grounded in real data and defined rules.
- UX is the bridge. Every flag is explained, every source is visible, and every decision is one screen away. That is what keeps a human confidently in the loop instead of rubber-stamping a black box.

And the loop between them is fast. The goal is not to digitize the old process but to redraw it around what AI does well and what humans must own. Decision rights are set by risk: some steps run autonomously, and others keep a human in the loop by design. An edge case found in the morning can be absorbed into the standard by evening. That speed is the moat, and it comes from partnership, not procurement.
What great looks like, in action
This is what “AI as a business capability” looks like once it is actually running in production. Not a pilot. Not a demo. Codified great, executing at volume, to standard, every time. For the regulated industries that spent decades writing down what great looks like, knowledge work has entered a genuinely new realm of possibilities.
That is what great looks like, in action.
Building production AI in a regulated domain?
That is exactly the altitude we fly at. If you have a high-stakes, high-volume workflow where the written standard is clear but running it consistently is the bottleneck, let’s talk.