Reasoning & Analysis

Claude Track

Module 12

Claude Track -- Module 12

Review Intelligence: ThreadCo receives 200 reviews per month. ShopMate now reads them all and produces a monthly report: top complaints, most praised features, sizing issues by product line, suggested product improvements. What took Maya 4 hours now takes 3 minutes.

Claude excels at multi-step reasoning, comparative analysis, and synthesis. This module covers how to get Claude to think deeply about problems — from simple analysis tasks to complex reasoning chains that rival human expert thinking. Claude Opus supports extended thinking, an API feature where Claude reasons internally before responding.

Comparative Analysis

Ask Claude to compare options across defined criteria with a rubric. Works for technical decisions, business cases, and research synthesis. Provide scoring criteria explicitly — Claude's comparisons are only as good as the framework you give it. Without criteria, Claude defaults to generic pros/cons lists.

Root Cause Analysis

Describe a system failure and ask Claude to identify potential root causes ranked by likelihood. Include logs, stack traces, and what you expected vs what happened. The more diagnostic data you provide, the more accurate Claude's root cause identification becomes.

Structured Decomposition

For large ambiguous problems, ask Claude to first decompose into sub-problems, then solve each. This dramatically improves accuracy over single-shot prompting for complex tasks. The decomposition itself is valuable — it reveals hidden complexity and dependencies.

Extended Thinking

Claude Opus 4 and Sonnet 4 support extended thinking mode — Claude reasons internally before responding. Available via the Anthropic API and Claude Code. Ideal for math, logic puzzles, multi-constraint optimisation, and any task requiring 10+ reasoning steps. Thinking tokens are billed at a reduced rate.

Chain-of-Thought Reasoning

Chain-of-thought (CoT) prompting asks Claude to show its reasoning steps before giving a final answer. This is one of the most well-studied techniques in LLM research, and it consistently improves accuracy on reasoning-heavy tasks.

Chat Panel — Without Chain-of-Thought

You:
Should ThreadCo switch from Royal Mail to DPD for UK deliveries?

Claude (without CoT):
Yes, DPD is generally faster and offers better tracking.
— Shallow. Missing cost analysis, volume considerations, customer preferences.

Chat Panel — With Chain-of-Thought

You:
Should ThreadCo switch from Royal Mail to DPD for UK deliveries?
Think through this step by step. Consider: cost per parcel at our volume
(500/week), delivery speed, tracking quality, customer complaints about
each carrier, weekend delivery options, and environmental impact.

Claude (with CoT):
Let me work through each factor:

1. COST: At 500 parcels/week, Royal Mail 2nd class is ~£3.20/parcel.
   DPD standard is ~£4.50/parcel. Annual difference: £33,800.

2. SPEED: Royal Mail 2nd class: 2-3 days. DPD: next-day.
   But only 12% of ThreadCo's customer complaints mention speed.

3. TRACKING: This is the key issue. 34% of complaints are "where is
   my order?" Royal Mail tracking is basic. DPD provides real-time
   GPS tracking with 1-hour delivery windows...

[continues through all factors, then gives a reasoned recommendation]

When Chain-of-Thought Helps Most

CoT provides the biggest accuracy improvement on: mathematical reasoning, multi-factor decisions, code debugging, logical deductions, and any task with more than 3 variables to consider. It provides little benefit for simple classification, creative writing, or translation — those tasks don't have complex reasoning chains.

Extended Thinking — Deep Reasoning Mode

Extended thinking is a Claude feature where the model reasons internally before producing its visible response. Unlike chain-of-thought (where reasoning is visible), extended thinking happens in a hidden "thinking" block. Claude can think for hundreds or thousands of tokens before writing a single word of response.

Python — Extended Thinking via API

import anthropic
client = anthropic.Anthropic()

# Extended thinking is enabled by setting a thinking budget
resp = client.messages.create(
    model="claude-opus-4-6",
    max_tokens=16000,
    thinking={
        "type": "enabled",
        "budget_tokens": 10000  # Claude can use up to 10K tokens for thinking
    },
    messages=[{
        "role": "user",
        "content": """ThreadCo is considering three pricing strategies for their
new premium line. Analyse each strategy considering:
- Impact on existing customer base (2,000 active buyers)
- Margin implications (current margin: 42%)
- Brand perception in the sustainable fashion market
- Competitor pricing (£25-£45 range)

Strategy A: Premium pricing at £55 with free shipping
Strategy B: Value pricing at £35 with £3.50 shipping
Strategy C: Tiered pricing — £40 standard, £55 for organic-certified"""
    }]
)

# Access the thinking and response separately
for block in resp.content:
    if block.type == "thinking":
        print("THINKING:", block.thinking[:200], "...")
    elif block.type == "text":
        print("RESPONSE:", block.text)

Feature	Chain-of-Thought (CoT)	Extended Thinking
Where reasoning happens	Visible in the response	Hidden thinking block
Control	Prompted via "think step by step"	Enabled via API parameter
Token usage	Reasoning counts as output tokens	Thinking tokens billed at reduced rate
Best for	Transparency — you want to see the reasoning	Accuracy — you want the best answer, don't need to see the work
Available on	All models, all interfaces	Opus and Sonnet, via API and Claude Code
Quality impact	Significant improvement over no reasoning	Further improvement over visible CoT for hard problems

Mathematical and Quantitative Reasoning

Claude is capable of substantial mathematical reasoning, but it has specific strengths and weaknesses you should know:

Strengths

Algebra, statistics, probability, financial calculations, optimisation problems, and word problems that require translating English into math. Claude Opus with extended thinking can solve multi-step problems that require 10+ calculation steps.

Limitations

Claude performs arithmetic on tokens, not a calculator. It can make errors on large multiplications, long division, or calculations with many decimal places. For precision-critical math, ask Claude to write code that computes the answer rather than computing it directly.

Best Practice: Code for Computation

For any calculation that matters, ask Claude to write a Python script that computes the answer. The code is verifiable, reproducible, and exact. Claude's mathematical reasoning (setting up the problem) is excellent; its arithmetic is good but not perfect.

Statistical Analysis

Claude can interpret statistical results, explain p-values, identify appropriate tests, and spot flaws in experimental design. Feed it raw data and it can suggest analyses, identify outliers, and explain findings in plain language.

Chat Panel — Math via Code

You:
ThreadCo sells 500 tees/week at £28 each. We're considering a 10% price
increase. Historical data suggests a 10% price increase causes a 6% volume
drop for our market segment.

Write a Python script that calculates:
1. Current weekly revenue
2. Projected weekly revenue after the increase
3. Break-even volume drop (at what % volume drop do we lose money?)
4. Annual impact assuming the volume drop stabilises after 4 weeks

Claude writes and runs the code, giving exact numbers rather than
approximate mental arithmetic.

Code Reasoning and Debugging

Claude's reasoning capabilities shine when applied to code. Here are the most effective patterns:

Trace Execution

Ask Claude to trace through code with specific input values: "Walk through @checkout.ts with these inputs: cart = [{id: 1, qty: 2}], discount = 'SAVE10'. Show me the value of each variable at each step." This catches logical errors that static analysis misses.

Identify Edge Cases

"What inputs to @validateOrder() would cause unexpected behaviour? Consider: null, undefined, empty arrays, negative numbers, extremely large numbers, Unicode strings, concurrent calls." Claude systematically generates edge cases you haven't thought of.

Architecture Reasoning

"Given @schema.prisma and @api-routes/, identify any N+1 query problems, missing indexes, or scalability bottlenecks. Assume we'll go from 500 orders/day to 50,000 orders/day." Claude reasons about system behaviour at scale.

Root Cause from Symptoms

Describe the symptoms (error message, unexpected behaviour, performance degradation) and provide the relevant code. Ask Claude to hypothesise root causes ranked by likelihood, then verify each hypothesis by tracing the code. This mirrors how experienced engineers debug.

Complex Problem Solving — The Decomposition Method

For problems that are too complex for a single prompt, use structured decomposition:

Chat Panel — Decomposition Method

Turn 1 — Decompose:
ThreadCo wants to expand from UK-only to EU shipping. Break this
problem down into all the sub-problems we need to solve. Don't
solve anything yet — just list the sub-problems and their dependencies.

Claude produces a dependency graph of 8-12 sub-problems.

Turn 2 — Prioritise:
Which of these sub-problems are blockers (must be solved first)?
Which are independent and can be done in parallel?
Create a phased implementation plan.

Turn 3 — Solve each sub-problem:
Let's tackle "VAT compliance for EU cross-border sales" first.
@returns-policy.md @pricing-model.ts
What are the specific requirements and how should we implement them?

Turn 4 — Integrate:
Now that we've solved the first three sub-problems, are there any
interactions or conflicts between the solutions? What changes when
we put them together?

When Decomposition Backfires

Over-decomposition can be as harmful as under-decomposition. If you break a simple problem into 10 sub-steps, each step loses the holistic view and you get a fragmented, inconsistent result. Use decomposition for genuinely complex problems (5+ interacting factors). For simpler problems, a single well-structured prompt with chain-of-thought works better.

Analysis Prompt Patterns

Analysis Type	Prompt Pattern	Output Format
SWOT Analysis	"Perform a SWOT analysis of [X]. For each quadrant, provide 3-5 specific, actionable points — not generic observations."	4-quadrant table with specifics
Decision Matrix	"Compare [options] across [criteria]. Score each 1-10 with justification. Weight the criteria by importance."	Weighted scoring table
Risk Assessment	"Identify risks for [project/decision]. For each: likelihood (1-5), impact (1-5), mitigation strategy, early warning signs."	Risk register table
Trend Analysis	"Analyse [data] for patterns. Identify: upward/downward trends, seasonal patterns, anomalies, and what might be causing them."	Findings with supporting evidence
Gap Analysis	"Compare our current state [describe] vs desired state [describe]. Identify each gap, its severity, and the effort required to close it."	Gap-by-gap breakdown with effort estimates

ShopMate -- Monthly Review Report

Python -- shopmate/reviews/monthly_report.py

# shopmate/reviews/monthly_report.py -- 200 reviews in, 3-minute report out
import anthropic, json
client = anthropic.Anthropic()

def generate_review_report(reviews: list[dict]) -> dict:
    """Analyse all ThreadCo reviews for the month and extract actionable insights."""
    reviews_text = "
".join(
        f"[{r['rating']}/5] {r['product']}: {r['text']}"
        for r in reviews
    )
    resp = client.messages.create(
        model="claude-sonnet-4-6", max_tokens=1500,
        system="""You are analysing customer reviews for ThreadCo, a T-shirt brand.
Produce an actionable monthly report for the founder.
Focus on patterns, not individual reviews.
Return ONLY valid JSON.""",
        messages=[{"role":"user","content":
            f"Analyse these {len(reviews)} reviews:

{reviews_text}

"
            'Return JSON: {"top_complaints":["..."],"top_praises":["..."],'
            '"sizing_issues":["..."],"product_suggestions":["..."],'
            '"average_rating":0.0,"summary":"2-3 sentences for the founder"}'
        }]
    )
    return json.loads(resp.content[0].text)

# Sample reviews
sample_reviews = [
    {"rating":5,"product":"Sunset Gradient Tee","text":"Softest tee I own, the colour is exactly as shown"},
    {"rating":2,"product":"Midnight Pocket Tee","text":"Runs small, ordered my usual size and it was tight"},
    {"rating":4,"product":"Wave Print Crop Tee","text":"Love the print but shipping took 2 weeks"},
    {"rating":3,"product":"Midnight Pocket Tee","text":"Nice quality but definitely size up"},
]
report = generate_review_report(sample_reviews)
print(json.dumps(report, indent=2))

Tips for Better Reasoning Output

Give Claude All the Data

Reasoning quality is directly proportional to data quality. A vague question gets a vague analysis. Attach the actual data, not a description of the data. Include numbers, dates, names, and specific constraints. Claude cannot reason about information it does not have.

Ask for Confidence Levels

Add "For each conclusion, rate your confidence: high (strong evidence), medium (some evidence), or low (inference/assumption)." This forces Claude to distinguish between what it knows and what it is guessing, which is invaluable for decision-making.

Challenge the Reasoning

After Claude gives an analysis, push back: "What is the strongest argument against your conclusion?" or "What assumptions are you making that might be wrong?" Claude is good at adversarial self-review when explicitly asked.

Use Opus for High-Stakes Analysis

For decisions with significant consequences — pricing changes, architecture decisions, security reviews — use Opus with extended thinking. The cost difference is negligible for one-off analyses, and the quality difference on complex reasoning tasks is real and measurable.

Hands-On Exercises

Exercise 1 — Chain-of-Thought vs Direct

Take a business decision your team is facing (or use: "Should ThreadCo offer free returns?"). Ask Claude the question directly, then ask it again with "Think through this step by step, considering at least 5 factors." Compare the depth and quality. Count the number of factors each approach considers.

Exercise 2 — Root Cause Analysis

Take a recent bug or production issue your team resolved. Give Claude the symptoms (error messages, logs, user reports) but NOT the solution. Ask Claude to identify the root cause. Compare Claude's analysis with the actual root cause your team found. How close was it? What additional information would have helped?

Exercise 3 — Decision Matrix

Identify a decision your team needs to make (tool selection, vendor choice, architecture approach). Define 5 criteria and ask Claude to build a weighted decision matrix. Then challenge Claude: "What criteria am I missing?" and "Argue the case for the option you ranked lowest." Does the analysis change?

Exercise 4 — Math via Code

Take a real business calculation (pricing analysis, inventory forecasting, or cost projection). Ask Claude to solve it twice: (a) reasoning directly in text, (b) writing a Python script. Compare the accuracy of the two approaches. For which types of calculations is the code approach significantly more accurate?

Exercise 5 — Decomposition Practice

Take the most complex project your team is working on. Ask Claude to decompose it into sub-problems with dependencies. Review the decomposition: Did Claude identify sub-problems you hadn't considered? Are the dependency relationships correct? Use the decomposition as the basis for your actual project plan.

<-- Context & Memory Next: Code Generation -->