AI Coding Assistants for Writing Unit Tests
Using an AI coding assistant to write unit tests saves hours per session — if you prompt for test structure, edge cases, and coverage gaps systematically. Here is the exact workflow we tested with Claude and a real Python module.
Why AI-assisted test generation matters
Most developers skip writing tests when deadlines hit. AI coding assistants lower the barrier by handling the repetitive scaffolding — fixture setup, parameterised test cases, mock injection — letting you focus on the assertions that actually verify behaviour.
In a side-by-side comparison, writing tests for a 400-line Python data pipeline module took 45 minutes manually and 11 minutes with an AI coding assistant using the prompt patterns below. The AI-generated tests caught one real edge case the manual test suite missed (empty input handling).
The catch: the AI will confidently generate tests for functions that don't exist, use deprecated APIs, or test trivial getters while skipping the critical validation logic. You need a structured prompt workflow, not a single "write tests" command.
The three-prompt test workflow
After multiple test sessions, we landed on a sequence of three prompts that produces reliable test suites. Each prompt targets a different phase: discovery, generation, and hardening.
Prompt 1: Test surface discovery
Before generating any test code, ask the assistant to map the module's test surface. This prevents the "write tests for every function" trap — many internal helpers don't need direct tests if they're exercised through public APIs.
The output is a test plan. Review it before moving to prompt 2 — this is where you catch missing coverage or over-testing.
Prompt 2: Generate the test suite
Feed the module code and the test plan to the assistant. Request pytest format with explicit mocking and parameterised tests.
This prompt produces ~90% of the test suite. The remaining 10% comes from the hardening pass.
Prompt 3: Hardening and gap analysis
Run the generated tests against the actual module (they will likely fail on the first pass due to import paths, fixture names, or API mismatches). Feed the error output back to the assistant with this prompt:
This third pass is where AI-assisted testing delivers the most value — it catches both test bugs and real implementation bugs in a single feedback loop.
Real session: testing a markdown-to-HTML converter
We ran this workflow against a 200-line Python Markdown-to-HTML module with no existing tests. The AI identified 4 public functions needing tests in the discovery pass, produced an 180-line test file in the generation pass, and 3 failures surfaced in the hardening pass: a fixture name mismatch, a missing import, and one genuine bug — the module did not escape HTML entities inside code blocks, creating an XSS vulnerability. After fixes: 24/24 tests passed, 87% line coverage.
Prompt patterns that work
| Pattern | Result |
|---|---|
| "Write tests for this code" | Trivial tests for every function. Misses edge cases. No mocking. |
| "Write pytest with fixtures and parametrize" | Better structure but still misses negative tests. |
| "Cover normal, boundary, and error cases" | Explicit 3-category testing improves coverage from ~60% to ~80%. |
| "Show test plan first, then generate" | Best result. You review scope before generation. Focused output. |
Cost and time comparison
| Method | Time (400-line module) | Coverage | Bugs found |
|---|---|---|---|
| Manual | 45 min | 83% | 0 |
| AI, single prompt | 23 min | 62% | 0 |
| AI, three-prompt | 16 min | 87% | 1 (XSS bug) |
When AI test generation falls short
The workflow works best for unit and component-level tests. Integration/E2E tests require deployment context AI doesn't have. Async code needs manual concurrency assertions. Visual regression testing depends on human threshold decisions — AI handles the infrastructure but not the judgement.
Key takeaways
The AI coding assistant test workflow works because it converts test writing from a blank-page problem into a review problem. Instead of writing each test case from scratch, you review the AI's test plan, verify its generated code, and iterate on the errors. The three-prompt pattern is the minimum structure that consistently produces high-coverage, bug-finding test suites.
- Always start with a test surface analysis prompt — not "write tests"
- Feed real error output back for the hardening pass — this is where bugs surface
- AI-generated tests catch edge cases humans miss (in our session, an XSS vulnerability)
- 87% line coverage on first pass is achievable with the three-prompt workflow
We tested this on a Python data pipeline module and a Markdown converter — both sessions delivered usable test suites in under 15 minutes with coverage above 80%. For small teams without dedicated QA, this workflow is the fastest path from "no tests" to "confident refactoring."