/qa -- Test like a real user

Click everything. Fill every form. Check every state. Fix bugs with atomic commits.

Testing modes

The agent picks the testing mode based on what you built. You can override with the --mode flag.

  • browser -- Uses Playwright to test web applications. Navigates pages, fills forms, clicks buttons, checks rendered output.
  • native -- Uses computer vision to test desktop or mobile apps. Takes screenshots, identifies UI elements, simulates interaction.
  • api -- Sends HTTP requests to API endpoints. Validates response codes, body shapes, headers, and error handling.
  • cli -- Runs command-line tools with various arguments. Checks exit codes, stdout, stderr, and file system side effects.
  • debug -- Root-cause investigation mode. The agent does not run a test suite. Instead it reproduces a specific bug, traces the cause, and proposes a fix.

Coverage order

The agent tests states in a fixed order. Each layer depends on the previous one working.

  • Happy path -- The feature works as designed with valid input. If this fails, everything else is meaningless.
  • Error states -- Invalid input, network failures, permission denied. The UI should show a useful message, not a blank screen or stack trace.
  • Empty states -- No data, first-time user, cleared cache. These are the states most often forgotten and most often seen by new users.
  • Edge cases -- Boundary values, very long strings, special characters, concurrent operations. The inputs nobody thinks of during development.
  • Loading states -- Slow network, large datasets, spinner behavior, skeleton screens. Tested last because they require simulating latency.

The WTF heuristic

If more than 20% of tested interactions produce unexpected behavior, the agent stops testing and reports. The reasoning: when one in five things is broken, testing more states is a waste. The code needs structural fixes first, not more bug reports.

# qa.json when WTF threshold is hit
{
  "status": "halted",
  "reason": "wtf_threshold",
  "tested": 15,
  "unexpected": 4,
  "rate": 0.27,
  "recommendation": "Return to /review. Too many issues for QA to be productive."
}

Visual QA

In browser and native modes, the agent takes screenshots at each test step. It compares the visual output against expectations: correct layout, readable text, no overlapping elements, proper alignment. Screenshots are saved to .nanostack/screenshots/ and referenced in the QA report.

Bug fixing

When the agent finds a bug, it classifies the fix:

  • Mechanical -- The fix is obvious and has no tradeoffs. Wrong CSS property, missing null check, incorrect string. The agent fixes it immediately with an atomic commit.
  • Judgment -- The fix involves a design decision. Should a missing field return null or throw? Should the UI hide or disable the button? The agent reports the bug with options and waits.

Each mechanical fix is a separate commit with a message that references the QA finding. This makes it easy to revert a specific fix without losing the rest.

fix(qa): handle empty array in notification list (#QA-003)

Previously rendered "undefined" when notifications array was empty.
Now shows empty state component.

Prompt injection boundary

When testing web applications, the agent fills forms with normal test data. It does not inject prompts, scripts, or payloads into input fields. That is the job of /security.

This separation matters. QA tests whether the application works correctly for legitimate users. Security tests whether the application can be abused by attackers. Mixing the two produces unreliable results for both.

Previous/reviewNext/security