Prompt Engineering

PROMPT ENGINEERING TECHNIQUES FOR DEVELOPERS

Most developers treat prompts as chat messages. That's the wrong mental model. Prompts are code — they need structure, versioning, and testing. This guide covers the techniques that turn fragile prompts into reliable production assets.

FreeLast tested: 2026-06-18Audience: developers / engineers

Why prompts break in production

During development, prompts work because you're in the same context as the model. You know what you meant. In production, the model sees only what you wrote — and small ambiguities compound fast.

The three most common failure modes:

The fix is not better models — it's better prompt structure. See AI Content Workflow Template for a practical example of structured prompt design in a production pipeline.

Technique 1: Structured prompt templates

Never write prompts as free-form text. Use a template with labeled sections:

ROLE: You are a senior developer reviewing pull requests. TASK: Review the code diff and identify security issues, performance problems, and style violations. OUTPUT FORMAT: Return a JSON array with keys: severity, category, line, description, suggestion. CONSTRAINTS: Only flag issues that are real problems. Do not flag style preferences. Max 10 issues. INPUT: {{code_diff}}

This structure works because:

Store these templates in code files, not in your chat history. Version them with your codebase.

Technique 2: Few-shot prompting

When the task is complex, give the model examples of correct outputs. This is called few-shot prompting, and it's often more effective than longer instructions.

TASK: Classify the sentiment and extract the main topic. Example 1: Input: "The API response time increased from 200ms to 2s after the last deploy." Output: {"sentiment": "negative", "topic": "performance", "severity": "high"} Example 2: Input: "Added a new endpoint for user authentication with rate limiting." Output: {"sentiment": "neutral", "topic": "feature", "severity": "none"} Input: "{{user_input}}" Output:

Key rules for few-shot prompts:

Technique 3: Chain-of-thought reasoning

For complex reasoning tasks, ask the model to show its work before giving the final answer. This dramatically improves accuracy on math, logic, and multi-step tasks.

TASK: Determine whether this user request is a bug report, feature request, or question. Instructions: 1. First, analyze the request step by step. Consider: does it describe broken behavior? Does it ask for something new? Is it asking for clarification? 2. Then, assign a category based on your analysis. 3. Finally, output only the category name. Request: "The dashboard loads but the charts show no data even though the API returns results." Analysis:

The model will generate its reasoning, then produce the correct category. You can parse the final line for your application.

For developers building AI-powered tools, chain-of-thought is essential when the output affects downstream logic. See AI Coding Assistant Scope for how to apply structured reasoning in coding workflows.

Technique 4: Output schema enforcement

Don't trust the model to return valid JSON. Use one of these approaches:

# Example: Pydantic validation with retry from pydantic import BaseModel, ValidationError import openai class ReviewResult(BaseModel): severity: str category: str line: int description: str def get_review(code_diff: str, max_retries: int = 3): for attempt in range(max_retries): response = openai.ChatCompletion.create( model="gpt-4o", messages=[{"role": "user", "content": f"Review this: {code_diff}"}], response_format={"type": "json_object"} ) try: return ReviewResult.model_validate_json(response.choices[0].message.content) except ValidationError: if attempt == max_retries - 1: raise continue

Technique 5: Prompt evaluation

Every prompt should have a test suite. Here's a minimal evaluation pattern:

test_cases = [ {"input": "API is slow", "expected_category": "performance"}, {"input": "Add dark mode", "expected_category": "feature"}, {"input": "How do I reset my password?", "expected_category": "question"}, {"input": "Login fails with 500 error", "expected_category": "bug"}, ] def evaluate_prompt(prompt_template, test_cases): passed = 0 for case in test_cases: result = run_prompt(prompt_template, case["input"]) if result["category"] == case["expected_category"]: passed += 1 return f"Accuracy: {passed}/{len(test_cases)}" print(evaluate_prompt(PROMPT_TEMPLATE, test_cases))

Run this evaluation:

For a complete system that combines prompts with automated evaluation, see How to Package an AI Workflow as a Digital Product.

Limits and notes

Prompt engineering is not about finding the perfect prompt. It's about building a system where prompts are testable, versioned, and replaceable. The techniques above are starting points — adapt them to your use case and measure results.