The DispatchArchive

Field notes from the operation.

Working papers on Transfer of Experience and AI agents — shipped by teams running agents in production.

AI Agent StrategyMay 16, 2026AI Jungle

How to measure a managed AI agent after launch

The operating metrics that tell you whether a managed AI agent is actually improving the business.

An AI agent launch is not the finish line. It is the point where measurement finally becomes possible.

Before launch, you are mostly making educated guesses. After launch, you can see what the agent actually does: what it catches, what it misses, what humans approve, what they edit, and whether the business loop improves.

Measure the workflow, not the demo

A demo can look good while the workflow stays broken.

The useful question is not "did the agent produce a nice answer?" The useful question is "did the repeated operating loop get cleaner?"

For a business-development agent, that loop might be opportunity follow-up. For an operations agent, it might be meeting-to-task conversion. For a knowledge agent, it might be turning delivery experience into reusable firm memory.

Six metrics that matter

Throughput

How many useful units did the agent process?

Examples:

opportunities reviewed
meeting notes processed
follow-ups drafted
knowledge entries proposed
invoices checked

Throughput alone is not enough, but it tells you whether the agent is being used.

Approval rate

How often does a human approve the agent’s output?

A high approval rate can mean quality is good. It can also mean humans are rubber-stamping. Pair it with edit rate and sampling.

Edit rate

How much does the human change before approval?

Small tone edits are fine. Repeated substance edits tell you the agent is missing context or applying the wrong rule.

Rejection reasons

Every rejection should teach the system something.

Useful rejection categories:

wrong timing
weak evidence
wrong tone
missing source
low relevance
too risky
already handled

The categories matter more than the raw count. They show where the agent needs better inputs or narrower scope.

Time to next action

This is often the cleanest business metric.

How long does it take to move from meeting, signal, invoice, or client request to the next useful action? If the agent shortens that loop without reducing quality, it is doing real work.

Business outcome

The final metric depends on the role.

Bob might influence pipeline follow-up. EVA might reduce overdue tasks. INO might reduce repeated questions. Mo-Ni might reduce payment delays. The metric should be close enough to the workflow that the team believes it.

Add an evaluator

Any agent touching client-facing output needs a paired evaluator.

The evaluator samples work and scores it against a rubric:

factual accuracy
source use
tone
completeness
risk handling
approval compliance

This does not need to be heavy. A weekly sample is enough at the start. The point is to catch drift before it becomes a trust problem.

Watch for false productivity

An agent can make people busier by producing more things to review. That is not success.

If the agent creates a new queue that nobody trusts, shut it down or narrow the role. If it saves ten minutes but creates twenty minutes of review, the workflow is not ready.

The first managed agent should remove friction from a loop the team already cares about.

The review cadence

Use a simple cadence for the first month:

week 1: check whether the agent is using the right inputs
week 2: review approval and edit patterns
week 3: compare time-to-action before and after
week 4: decide whether to narrow, expand, or pause

Do not expand because the demo looked good. Expand because the numbers and the humans agree.

The standard

A managed AI agent earns more scope only when it proves three things: it handles the workflow, humans trust its boundaries, and the measured loop improves.

That is the bar we use inside MAIDA. If you want to choose the first loop to measure, start with the AI Jungle Assessment.