Model Family | AI Training Hub

Claude Track

Module 08

Claude Track -- Module 08

Picking the Right Size: ShopMate uses three different models depending on the task: Haiku for quick sentiment checks and order status lookups (fast, cheap), Sonnet for product descriptions and email replies (balanced), and Opus for generating the seasonal campaign copy that Maya reviews personally (highest quality). The right model for each job keeps costs low without sacrificing quality where it matters.

The Claude 4 Model Family

Anthropic offers three models optimised for different speed, capability, and cost trade-offs. Choosing the right model for each task is the single most impactful decision for cost efficiency and output quality in production systems.

Claude Opus 4.6

Most capable model. Best for complex reasoning, nuanced analysis, and agentic tasks that require deep judgment. Highest cost, slowest speed. Use when the quality of the output directly impacts a high-stakes decision — architecture reviews, legal analysis, campaign copy that will be seen by thousands.

claude-opus-4-6

Claude Sonnet 4.6

Best balance of intelligence and speed. Ideal for most production workloads: coding, data extraction, document analysis, customer support. Sonnet handles 90% of real-world tasks at a fraction of Opus's cost. It is the default choice — upgrade to Opus only when you can measure the quality difference.

claude-sonnet-4-6

Claude Haiku 4.5

Fastest and most cost-efficient. Best for high-throughput applications, simple classification, quick summaries, and latency-sensitive tasks. Haiku is not a "dumb" model — it is surprisingly capable for structured tasks. Use it wherever speed matters more than depth.

claude-haiku-4-5-20251001

Detailed Comparison

Dimension	Opus 4.6	Sonnet 4.6	Haiku 4.5
Context window	200K tokens	200K tokens	200K tokens
Max output	32K tokens	16K tokens	8K tokens
Input price (per 1M tokens)	$15.00	$3.00	$0.80
Output price (per 1M tokens)	$75.00	$15.00	$4.00
Latency (time to first token)	~2-5 seconds	~0.5-2 seconds	~0.2-0.5 seconds
Extended thinking	Yes	Yes	No
Tool use	Yes	Yes	Yes
Vision (image input)	Yes	Yes	Yes
Streaming	Yes	Yes	Yes
Prompt caching	Yes (90% discount)	Yes (90% discount)	Yes (90% discount)
Batch API	Yes (50% discount)	Yes (50% discount)	Yes (50% discount)

Model Selection Rule

Start development with Sonnet. Upgrade to Opus only for tasks where quality materially differs. Use Haiku for pre-processing, routing, classification, and any latency-sensitive path. The cost difference between Haiku and Opus is roughly 19x on input and 19x on output.

Pricing Disclaimer

All prices shown are per 1 million tokens (input / output) and were current as of April 2026. Verify the latest rates at anthropic.com/pricing before building cost models.

When to Use Each Model — Decision Framework

The most common mistake is using Opus for everything. The second most common mistake is using Haiku for everything. Here is a practical decision framework:

Use Case	Model	Rationale
Email classification (complaint/question/praise)	Haiku	Simple classification — Haiku matches Sonnet's accuracy, 5x faster
Content moderation / safety screening	Haiku	Binary or low-cardinality decisions at high throughput
Routing (which department handles this ticket?)	Haiku	Fast routing enables downstream processing without delay
Quick summaries (1-2 sentences)	Haiku	Short outputs don't benefit from Opus's depth
Data extraction from structured documents	Haiku or Sonnet	Haiku for simple forms; Sonnet for complex tables
Product descriptions, email drafts	Sonnet	Needs nuance and brand voice but not deep reasoning
Code generation and editing	Sonnet	Strong coding capability at reasonable cost
Code review (routine)	Sonnet	Catches most issues; Opus for security-critical reviews
Customer support responses	Sonnet	Needs empathy, accuracy, and context awareness
Document analysis (contracts, reports)	Sonnet	Good comprehension of long documents at lower cost
Architecture and design decisions	Opus	Multi-factor reasoning with long-term implications
Security audits and vulnerability analysis	Opus	Subtle issues require deep, systematic reasoning
Complex mathematical or logical proofs	Opus	Extended thinking enables multi-step reasoning chains
Campaign copy (high visibility)	Opus	Quality ceiling matters when thousands will read the output
Legal document review	Opus	Nuanced interpretation where missed details have real consequences

Model Selection Flowchart

Decision Tree — Which Model?

Is this a simple classification, routing, or yes/no decision?
  YES → Use Haiku
  NO ↓

Does this task require deep multi-step reasoning or extended thinking?
  YES → Use Opus
  NO ↓

Is this high-stakes? (customer-facing, security, legal, or visible to many people)
  YES → Run with Sonnet first. Compare quality with Opus on 5 samples.
         If Opus is measurably better → Use Opus
         If quality is similar → Stay with Sonnet
  NO ↓

Use Sonnet (default for everything else)

Cost Modelling — Real Numbers

Here is what ThreadCo's monthly Claude usage looks like with intelligent model routing vs. using Sonnet for everything:

Task	Volume/Month	Avg Tokens (In/Out)	Routed Model	Routed Cost	All-Sonnet Cost
Email classification	2,000	400 / 30	Haiku	$0.88	$3.30
Product descriptions	500	600 / 150	Sonnet	$2.03	$2.03
Code reviews	200	2000 / 500	Sonnet	$2.70	$2.70
Customer replies	500	800 / 200	Sonnet	$2.70	$2.70
Campaign copy	8	1000 / 800	Opus	$0.60	$0.12
Architecture reviews	4	5000 / 2000	Opus	$0.90	$0.18
Monthly Total				$9.81	$11.03

At ThreadCo's scale (small team), the savings are modest. But scale this to 100x volume (a medium company processing 200,000 emails/month) and the routing approach saves thousands per month. The principle matters more than the absolute numbers: never pay for reasoning you don't need.

Latency Considerations

Cost is not the only reason to choose the right model. Latency directly impacts user experience:

Real-Time Chat (< 1s TTFT)

If your application is a live chatbot where users are waiting for a response, you need Haiku or Sonnet. Opus's 2-5 second time to first token feels sluggish in a chat interface. Haiku's sub-500ms TTFT feels instant.

Background Processing (latency irrelevant)

For batch jobs, nightly reports, or async workflows, latency does not matter. Use the best model for quality and optimise cost with the Batch API (50% discount). There is no reason to sacrifice quality for speed on a task nobody is waiting for.

IDE Assistant (1-3s acceptable)

In Claude Code, users expect a brief pause before responses appear. Sonnet hits this sweet spot. Opus is acceptable for complex tasks where users understand the model is "thinking." Haiku is useful for quick inline completions.

Streaming Mitigates Latency

Streaming returns tokens as they are generated, so the user sees the response building in real time. With streaming, even Opus feels responsive because the first few words appear quickly, even if the full response takes time to complete.

ShopMate -- Model Router

Python -- shopmate/router.py

# shopmate/router.py -- Right model for each ShopMate task
from enum import Enum

class Task(Enum):
    CLASSIFY    = "classify"    # sentiment check, category label
    WRITE       = "write"       # product descriptions, email replies
    CAMPAIGN    = "campaign"    # seasonal campaign copy, Maya reviews

# Model choice + max tokens per task
ROUTING = {
    Task.CLASSIFY: ("claude-haiku-4-5-20251001",  30),   # fastest, cheapest
    Task.WRITE:    ("claude-haiku-4-5-20251001",  250),  # good quality, low cost
    Task.CAMPAIGN: ("claude-sonnet-4-6",          800),  # best quality for Maya
}

# Cost per 1 million tokens (input / output) — verify at anthropic.com/pricing
PRICES = {
    "claude-haiku-4-5-20251001": (0.80,  4.00),   # $0.80 input / $4.00 output per 1M tokens
    "claude-sonnet-4-6":        (3.00, 15.00),  # $3.00 input / $15.00 output per 1M tokens
}

def route(task: Task) -> tuple[str, int]:
    return ROUTING[task]

def monthly_cost_estimate() -> None:
    volumes = {Task.CLASSIFY: 2000, Task.WRITE: 500, Task.CAMPAIGN: 8}
    total = 0
    print(f"{'Task':<15} {'Model':<35} {'Vol':>5} {'Cost':>8}")
    print("-" * 68)
    for task, vol in volumes.items():
        model, max_out = ROUTING[task]
        cin, cout = PRICES[model]
        cost = ((400/1e6)*cin + (max_out/1e6)*cout) * vol
        total += cost
        print(f"{task.value:<15} {model:<35} {vol:>5} ${cost:>7.2f}")
    print(f"
Estimated monthly total: ${total:.2f}")

monthly_cost_estimate()

Advanced — Dynamic Model Routing

For production systems, you can build a dynamic router that uses Haiku to classify the complexity of incoming requests and routes them to the appropriate model automatically:

Python -- shopmate/dynamic_router.py

# shopmate/dynamic_router.py -- Haiku classifies, then routes to the right model
import anthropic
client = anthropic.Anthropic()

def classify_complexity(user_message: str) -> str:
    """Use Haiku to classify whether a request needs Haiku, Sonnet, or Opus."""
    resp = client.messages.create(
        model="claude-haiku-4-5-20251001", max_tokens=10,
        system="""Classify the complexity of this user request.
Reply with ONLY one word: haiku, sonnet, or opus.
- haiku: simple lookup, yes/no, classification
- sonnet: writing, coding, analysis, most tasks
- opus: complex reasoning, architecture, security, multi-step logic""",
        messages=[{"role":"user", "content": user_message}]
    )
    return resp.content[0].text.strip().lower()

MODEL_MAP = {
    "haiku":  "claude-haiku-4-5-20251001",
    "sonnet": "claude-sonnet-4-6",
    "opus":   "claude-opus-4-6",
}

def smart_route(user_message: str) -> str:
    """Route a message to the appropriate model based on complexity."""
    complexity = classify_complexity(user_message)
    model = MODEL_MAP.get(complexity, "claude-sonnet-4-6")  # default to Sonnet
    print(f"Routing to {model} (classified as: {complexity})")
    resp = client.messages.create(
        model=model, max_tokens=1000,
        messages=[{"role":"user", "content": user_message}]
    )
    return resp.content[0].text

# The Haiku classification call costs fractions of a cent
# but saves dollars when it routes simple requests away from Opus

The Router Pattern in Production

At scale, the Haiku classification call adds ~200ms latency but can reduce total cost by 40-60%. The key insight: a 10-token classification with Haiku costs roughly $0.000004. Even if Haiku "over-routes" 20% of requests to a more expensive model, the savings on the other 80% more than compensate.

Context Window and Output Length Differences

While all Claude 4 models share the same 200K input context window, their maximum output lengths differ — and this affects which tasks each model can handle:

Opus: 32K Output

Opus can generate up to 32,000 tokens in a single response — roughly 24,000 words. This is enough for entire documents, comprehensive analysis reports, or large code files. Use Opus when you need long, detailed output that maintains quality throughout.

Sonnet: 16K Output

Sonnet generates up to 16,000 tokens — roughly 12,000 words. This is sufficient for most tasks: code files, email drafts, analysis summaries, and documentation. If you need longer output, split the request into sequential parts.

Haiku: 8K Output

Haiku generates up to 8,000 tokens — roughly 6,000 words. This is intentionally limited to match its use case: quick classifications, short summaries, and concise responses. If your task routinely needs more than 8K tokens of output, upgrade to Sonnet.

Output Truncation

If Claude's response reaches the max_tokens limit, it is truncated mid-sentence. This is a common source of broken JSON and incomplete code. Always set max_tokens high enough for your expected output, but not excessively high (it does not increase cost to set a higher limit — you only pay for tokens actually generated). If output is truncated, ask Claude to continue from where it stopped.

Benchmarks vs Real-World Performance

Published benchmarks (MMLU, HumanEval, MATH, etc.) are useful for comparing models at a high level, but they do not always predict performance on your specific tasks. Here is how to think about benchmarks practically:

Benchmark	What It Measures	Real-World Relevance
HumanEval / SWE-bench	Code generation and bug fixing	High — directly maps to coding assistance tasks
MMLU	Broad knowledge across 57 subjects	Moderate — shows general knowledge but not task-specific skill
MATH	Mathematical problem solving	Moderate — relevant for quantitative analysis tasks
Agentic benchmarks (SWE-bench Verified, TAU-bench)	Multi-step autonomous task completion	High — directly maps to Claude Code's agentic capabilities
Long context benchmarks	Recall and reasoning over long documents	High — relevant for document analysis and codebase review

The most reliable way to choose a model for your use case is to test it yourself: run 20 representative examples through each model, score the outputs, and compare. Your specific data, prompts, and quality standards matter more than any benchmark.

Model Versioning and Updates

Anthropic releases new model versions periodically. Understanding the versioning scheme prevents surprises in production:

Pinned vs. Latest

Model IDs like claude-sonnet-4-6 point to the latest release. For production stability, use the date-pinned version (e.g., claude-haiku-4-5-20251001) so your system behaviour doesn't change when Anthropic releases an update.

Testing New Versions

When a new model version is released, test it against your existing prompts before switching. Run your test suite with both the old and new model versions and compare output quality. Some prompts may need adjustment for the new version.

Deprecation Notices

Anthropic provides advance notice before deprecating model versions. Subscribe to the Anthropic changelog and set calendar reminders to migrate before deprecation dates. Running a deprecated model is not possible — your API calls will fail.

A/B Testing Models

For critical workflows, run A/B tests between model versions. Route 10% of traffic to the new model, compare quality metrics, then gradually increase if results are good. This is the safest way to upgrade in production.

Hands-On Exercises

Exercise 1 — Model Taste Test

Pick a task your team does daily (e.g., writing a customer email reply). Run the same prompt through Haiku, Sonnet, and Opus. Score each output 1-10 for quality. Is the quality difference between Sonnet and Opus worth the cost difference? Document your findings for your team's model routing table.

Exercise 2 — Cost Calculator

List your team's top 10 Claude use cases. For each, estimate: requests per month, average input tokens, and average output tokens. Calculate the monthly cost using (a) Sonnet for everything, (b) intelligent routing with all three models. What is the percentage savings?

Exercise 3 — Build a Routing Table

Create a simple routing table for your project: a spreadsheet or markdown table mapping each task type to a model. Include columns for: task name, model, max_tokens, expected quality (1-10), and cost per request. Review it monthly and adjust based on actual usage data.

Exercise 4 — Latency Measurement

Using the Claude Code chat panel, time how long each model takes to respond to the same prompt. Use a stopwatch or the API's response headers. Create a simple benchmark: (a) short classification task, (b) medium writing task, (c) complex reasoning task. Plot the results. How does latency change with output length?

Exercise 5 — Dynamic Router Implementation

Using the dynamic_router.py code above as a starting point, test it with 20 different user messages ranging from simple ("Is this a complaint?") to complex ("Review this microservices architecture for single points of failure"). Check whether Haiku classifies the complexity correctly at least 80% of the time. Adjust the classification prompt if needed.

<-- What is Claude? Next: Prompt Engineering -->