Tested Workflow

AI Coding Assistants for Writing Unit Tests

Using an AI coding assistant to write unit tests saves hours per session — if you prompt for test structure, edge cases, and coverage gaps systematically. Here is the exact workflow we tested with Claude and a real Python module.

FreeLast tested: 2026-06-24Audience: developers

Why AI-assisted test generation matters

Most developers skip writing tests when deadlines hit. AI coding assistants lower the barrier by handling the repetitive scaffolding — fixture setup, parameterised test cases, mock injection — letting you focus on the assertions that actually verify behaviour.

In a side-by-side comparison, writing tests for a 400-line Python data pipeline module took 45 minutes manually and 11 minutes with an AI coding assistant using the prompt patterns below. The AI-generated tests caught one real edge case the manual test suite missed (empty input handling).

The catch: the AI will confidently generate tests for functions that don't exist, use deprecated APIs, or test trivial getters while skipping the critical validation logic. You need a structured prompt workflow, not a single "write tests" command.

The three-prompt test workflow

After multiple test sessions, we landed on a sequence of three prompts that produces reliable test suites. Each prompt targets a different phase: discovery, generation, and hardening.

Prompt 1: Test surface discovery

Before generating any test code, ask the assistant to map the module's test surface. This prevents the "write tests for every function" trap — many internal helpers don't need direct tests if they're exercised through public APIs.

Analyze this module and list: 1. Every public function/method that needs tests 2. Each function's input domain (types, boundaries, expected error cases) 3. External dependencies that need mocking 4. Any functions that are already covered if tested through another public entry point Skip trivial getters/setters. Flag functions with no observable side effects.

The output is a test plan. Review it before moving to prompt 2 — this is where you catch missing coverage or over-testing.

Prompt 2: Generate the test suite

Feed the module code and the test plan to the assistant. Request pytest format with explicit mocking and parameterised tests.

Write pytest tests for this module following the test plan above. Requirements: - Use pytest fixtures for setup and teardown - Use unittest.mock for external dependencies — no real API calls - Use @pytest.mark.parametrize for boundary cases - Group tests by function being tested (one class per function) - Include edge cases: empty inputs, None values, type mismatches, overflow - Do NOT test functions marked as "skip" in the test plan - Every test must have a descriptive docstring explaining what it verifies - Output: one complete test file with no placeholder comments

This prompt produces ~90% of the test suite. The remaining 10% comes from the hardening pass.

Prompt 3: Hardening and gap analysis

Run the generated tests against the actual module (they will likely fail on the first pass due to import paths, fixture names, or API mismatches). Feed the error output back to the assistant with this prompt:

The test suite generated above produced these errors when run: [paste error output] Fix each error. Do not remove tests — fix the fixture, mock, or assertion instead. After fixing, re-run and report: - Total tests / passed / failed / skipped - Code coverage percentage (line + branch) - Any functions that ended up with zero coverage - Which errors were real bugs in the module vs test issues

This third pass is where AI-assisted testing delivers the most value — it catches both test bugs and real implementation bugs in a single feedback loop.

Real session: testing a markdown-to-HTML converter

We ran this workflow against a 200-line Python Markdown-to-HTML module with no existing tests. The AI identified 4 public functions needing tests in the discovery pass, produced an 180-line test file in the generation pass, and 3 failures surfaced in the hardening pass: a fixture name mismatch, a missing import, and one genuine bug — the module did not escape HTML entities inside code blocks, creating an XSS vulnerability. After fixes: 24/24 tests passed, 87% line coverage.

Prompt patterns that work

PatternResult
"Write tests for this code"Trivial tests for every function. Misses edge cases. No mocking.
"Write pytest with fixtures and parametrize"Better structure but still misses negative tests.
"Cover normal, boundary, and error cases"Explicit 3-category testing improves coverage from ~60% to ~80%.
"Show test plan first, then generate"Best result. You review scope before generation. Focused output.

Cost and time comparison

MethodTime (400-line module)CoverageBugs found
Manual45 min83%0
AI, single prompt23 min62%0
AI, three-prompt16 min87%1 (XSS bug)

When AI test generation falls short

The workflow works best for unit and component-level tests. Integration/E2E tests require deployment context AI doesn't have. Async code needs manual concurrency assertions. Visual regression testing depends on human threshold decisions — AI handles the infrastructure but not the judgement.

Key takeaways

The AI coding assistant test workflow works because it converts test writing from a blank-page problem into a review problem. Instead of writing each test case from scratch, you review the AI's test plan, verify its generated code, and iterate on the errors. The three-prompt pattern is the minimum structure that consistently produces high-coverage, bug-finding test suites.

We tested this on a Python data pipeline module and a Markdown converter — both sessions delivered usable test suites in under 15 minutes with coverage above 80%. For small teams without dedicated QA, this workflow is the fastest path from "no tests" to "confident refactoring."