Quality

QA that opens the app and tries it

May 30, 2026 · 4 min

The most common failure of AI-built features is also the most boring one: the tests pass and the feature does not work. The button is there, the handler is wired, the unit tests are green, and when a person actually clicks it, nothing happens.

Agents test what they wrote, not what you meant

When an agent writes tests for its own code, it tests its own understanding: the function returns what the function returns. If the agent misunderstood the feature, the tests encode the same misunderstanding, and they pass beautifully. Green tests measure internal consistency. They do not measure whether the thing works.

The only check that catches this class of failure is the one a user would run: open the app, do the thing, see what happens.

What /qa actually does

The QA phase in Nanostack verifies against the running software, and it picks its tool by what you built:

Web app: drives a real browser with Playwright. Clicks, fills forms, takes screenshots that get attached to the artifact.
API: makes real HTTP requests against the running service and checks status, body shape, and behavior. Not mocked clients.
CLI: executes the tool and inspects stdout, exit codes, and what landed on disk.

From the Stripe sprint, the QA artifact reads: checkout completes with a test card, account unlocks exactly once, locked page redirects non-subscribers, replayed webhook rejected for a bad signature. Four checks, eleven seconds, every one of them something a user or an attacker would actually do.

The replay check is the point

Notice the fourth check. Nobody asked for it in the feature request. It exists because QA runs after security in the sprint and reads the upstream artifacts: the plan flagged webhook signature verification as the high risk, security audited it, so QA proves it behaves under attack, not just under use. The phases feed each other. That is what an ordered workflow buys.

Failures stay in the sprint

When a QA check fails, it does not become a ticket for next week. The sprint is still open, the agent still has full context, and the fix happens now, followed by a re-run. The artifact records both: what failed, what was fixed, what passed on the second pass. By the time the phase gate lets the commit through, "works like a user expects" is a recorded fact, not a hope.

How /qa works → · Run it on an example app →

← All content · Install nanostack →