Quality

Code review for AI-written code, without reading every line

May 7, 2026 · 4 min

The honest version of how most AI-written code gets reviewed today: you scroll the diff, it looks reasonable, you merge. That is not a review. It is a vibe check on output you did not write.

Why eyeballing fails on agent code

Human code review works because the reviewer knows what the change was supposed to be. With an agent, that knowledge is fuzzy: the request lived in a chat, the agent interpreted it, and the diff is the first concrete thing you see. You end up reviewing whether the code looks like code, not whether it is the change you asked for.

Agents also fail differently than people. They rarely write broken syntax. They add an extra helper you did not ask for, install a dependency for a one-line task, or handle the happy path perfectly and return early on the retry path. Plausible code, wrong scope.

Review against the plan, not against taste

The fix is to give the review a reference point. In Nanostack, the planning phase writes plan.json before any code: which files, what risks, what is out of scope. The review phase then has two jobs, in order:

scope drift: compare the diff against the plan. Files changed versus files planned. Anything extra gets flagged before quality is even discussed.
findings: a second pass on correctness, each finding with a file and line, a severity, and a resolution. Findings get fixed in the same sprint, not filed for later.

The output is review.json, a saved artifact. Not a thumbs-up in chat that disappears at the next compaction, a record that the QA phase and the ship gate read downstream.

What this looks like in practice

A real review artifact from a Stripe webhook sprint reads like this: scope drift clean, four files changed, four planned. One should-fix: the webhook answered 200 before verifying the signature on the retry path, app/api/webhook/route.ts:21, fixed in-sprint. One note on what was done well: the access gate reads the subscription state server-side only.

That is maybe eight lines. You can read it in twenty seconds and know more about the change than thirty minutes of diff-scrolling would tell you, because the eight lines answer the only questions that matter: is this what we agreed, and what was wrong with it.

You still look at the code. Just later, and less.

None of this removes the human. It changes what the human reads first: the review artifact, then the PR description, then the diff if something smells off. The expensive attention goes where the machine already found friction, instead of being spread evenly over four hundred green lines.

Watch a full sprint, review included → · How /review works →

← All content · Install nanostack →