Reasoning & Analysis
Claude excels at multi-step reasoning, comparative analysis, and synthesis. This module covers how to get Claude to think deeply about problems — from simple analysis tasks to complex reasoning chains that rival human expert thinking. Claude Opus supports extended thinking, an API feature where Claude reasons internally before responding.
Comparative Analysis
Ask Claude to compare options across defined criteria with a rubric. Works for technical decisions, business cases, and research synthesis. Provide scoring criteria explicitly — Claude's comparisons are only as good as the framework you give it. Without criteria, Claude defaults to generic pros/cons lists.
Root Cause Analysis
Describe a system failure and ask Claude to identify potential root causes ranked by likelihood. Include logs, stack traces, and what you expected vs what happened. The more diagnostic data you provide, the more accurate Claude's root cause identification becomes.
Structured Decomposition
For large ambiguous problems, ask Claude to first decompose into sub-problems, then solve each. This dramatically improves accuracy over single-shot prompting for complex tasks. The decomposition itself is valuable — it reveals hidden complexity and dependencies.
Extended Thinking
Claude Opus 4 and Sonnet 4 support extended thinking mode — Claude reasons internally before responding. Available via the Anthropic API and Claude Code. Ideal for math, logic puzzles, multi-constraint optimisation, and any task requiring 10+ reasoning steps. Thinking tokens are billed at a reduced rate.
Chain-of-Thought Reasoning
Chain-of-thought (CoT) prompting asks Claude to show its reasoning steps before giving a final answer. This is one of the most well-studied techniques in LLM research, and it consistently improves accuracy on reasoning-heavy tasks.
You: Should ThreadCo switch from Royal Mail to DPD for UK deliveries? Claude (without CoT): Yes, DPD is generally faster and offers better tracking. — Shallow. Missing cost analysis, volume considerations, customer preferences.
You: Should ThreadCo switch from Royal Mail to DPD for UK deliveries? Think through this step by step. Consider: cost per parcel at our volume (500/week), delivery speed, tracking quality, customer complaints about each carrier, weekend delivery options, and environmental impact. Claude (with CoT): Let me work through each factor: 1. COST: At 500 parcels/week, Royal Mail 2nd class is ~£3.20/parcel. DPD standard is ~£4.50/parcel. Annual difference: £33,800. 2. SPEED: Royal Mail 2nd class: 2-3 days. DPD: next-day. But only 12% of ThreadCo's customer complaints mention speed. 3. TRACKING: This is the key issue. 34% of complaints are "where is my order?" Royal Mail tracking is basic. DPD provides real-time GPS tracking with 1-hour delivery windows... [continues through all factors, then gives a reasoned recommendation]
CoT provides the biggest accuracy improvement on: mathematical reasoning, multi-factor decisions, code debugging, logical deductions, and any task with more than 3 variables to consider. It provides little benefit for simple classification, creative writing, or translation — those tasks don't have complex reasoning chains.
Extended Thinking — Deep Reasoning Mode
Extended thinking is a Claude feature where the model reasons internally before producing its visible response. Unlike chain-of-thought (where reasoning is visible), extended thinking happens in a hidden "thinking" block. Claude can think for hundreds or thousands of tokens before writing a single word of response.
import anthropic client = anthropic.Anthropic() # Extended thinking is enabled by setting a thinking budget resp = client.messages.create( model="claude-opus-4-6", max_tokens=16000, thinking={ "type": "enabled", "budget_tokens": 10000 # Claude can use up to 10K tokens for thinking }, messages=[{ "role": "user", "content": """ThreadCo is considering three pricing strategies for their new premium line. Analyse each strategy considering: - Impact on existing customer base (2,000 active buyers) - Margin implications (current margin: 42%) - Brand perception in the sustainable fashion market - Competitor pricing (£25-£45 range) Strategy A: Premium pricing at £55 with free shipping Strategy B: Value pricing at £35 with £3.50 shipping Strategy C: Tiered pricing — £40 standard, £55 for organic-certified""" }] ) # Access the thinking and response separately for block in resp.content: if block.type == "thinking": print("THINKING:", block.thinking[:200], "...") elif block.type == "text": print("RESPONSE:", block.text)
| Feature | Chain-of-Thought (CoT) | Extended Thinking |
|---|---|---|
| Where reasoning happens | Visible in the response | Hidden thinking block |
| Control | Prompted via "think step by step" | Enabled via API parameter |
| Token usage | Reasoning counts as output tokens | Thinking tokens billed at reduced rate |
| Best for | Transparency — you want to see the reasoning | Accuracy — you want the best answer, don't need to see the work |
| Available on | All models, all interfaces | Opus and Sonnet, via API and Claude Code |
| Quality impact | Significant improvement over no reasoning | Further improvement over visible CoT for hard problems |
Mathematical and Quantitative Reasoning
Claude is capable of substantial mathematical reasoning, but it has specific strengths and weaknesses you should know:
Strengths
Algebra, statistics, probability, financial calculations, optimisation problems, and word problems that require translating English into math. Claude Opus with extended thinking can solve multi-step problems that require 10+ calculation steps.
Limitations
Claude performs arithmetic on tokens, not a calculator. It can make errors on large multiplications, long division, or calculations with many decimal places. For precision-critical math, ask Claude to write code that computes the answer rather than computing it directly.
Best Practice: Code for Computation
For any calculation that matters, ask Claude to write a Python script that computes the answer. The code is verifiable, reproducible, and exact. Claude's mathematical reasoning (setting up the problem) is excellent; its arithmetic is good but not perfect.
Statistical Analysis
Claude can interpret statistical results, explain p-values, identify appropriate tests, and spot flaws in experimental design. Feed it raw data and it can suggest analyses, identify outliers, and explain findings in plain language.
You: ThreadCo sells 500 tees/week at £28 each. We're considering a 10% price increase. Historical data suggests a 10% price increase causes a 6% volume drop for our market segment. Write a Python script that calculates: 1. Current weekly revenue 2. Projected weekly revenue after the increase 3. Break-even volume drop (at what % volume drop do we lose money?) 4. Annual impact assuming the volume drop stabilises after 4 weeks Claude writes and runs the code, giving exact numbers rather than approximate mental arithmetic.
Code Reasoning and Debugging
Claude's reasoning capabilities shine when applied to code. Here are the most effective patterns:
Trace Execution
Ask Claude to trace through code with specific input values: "Walk through @checkout.ts with these inputs: cart = [{id: 1, qty: 2}], discount = 'SAVE10'. Show me the value of each variable at each step." This catches logical errors that static analysis misses.
Identify Edge Cases
"What inputs to @validateOrder() would cause unexpected behaviour? Consider: null, undefined, empty arrays, negative numbers, extremely large numbers, Unicode strings, concurrent calls." Claude systematically generates edge cases you haven't thought of.
Architecture Reasoning
"Given @schema.prisma and @api-routes/, identify any N+1 query problems, missing indexes, or scalability bottlenecks. Assume we'll go from 500 orders/day to 50,000 orders/day." Claude reasons about system behaviour at scale.
Root Cause from Symptoms
Describe the symptoms (error message, unexpected behaviour, performance degradation) and provide the relevant code. Ask Claude to hypothesise root causes ranked by likelihood, then verify each hypothesis by tracing the code. This mirrors how experienced engineers debug.
Complex Problem Solving — The Decomposition Method
For problems that are too complex for a single prompt, use structured decomposition:
Turn 1 — Decompose: ThreadCo wants to expand from UK-only to EU shipping. Break this problem down into all the sub-problems we need to solve. Don't solve anything yet — just list the sub-problems and their dependencies. Claude produces a dependency graph of 8-12 sub-problems. Turn 2 — Prioritise: Which of these sub-problems are blockers (must be solved first)? Which are independent and can be done in parallel? Create a phased implementation plan. Turn 3 — Solve each sub-problem: Let's tackle "VAT compliance for EU cross-border sales" first. @returns-policy.md @pricing-model.ts What are the specific requirements and how should we implement them? Turn 4 — Integrate: Now that we've solved the first three sub-problems, are there any interactions or conflicts between the solutions? What changes when we put them together?
Over-decomposition can be as harmful as under-decomposition. If you break a simple problem into 10 sub-steps, each step loses the holistic view and you get a fragmented, inconsistent result. Use decomposition for genuinely complex problems (5+ interacting factors). For simpler problems, a single well-structured prompt with chain-of-thought works better.
Analysis Prompt Patterns
| Analysis Type | Prompt Pattern | Output Format |
|---|---|---|
| SWOT Analysis | "Perform a SWOT analysis of [X]. For each quadrant, provide 3-5 specific, actionable points — not generic observations." | 4-quadrant table with specifics |
| Decision Matrix | "Compare [options] across [criteria]. Score each 1-10 with justification. Weight the criteria by importance." | Weighted scoring table |
| Risk Assessment | "Identify risks for [project/decision]. For each: likelihood (1-5), impact (1-5), mitigation strategy, early warning signs." | Risk register table |
| Trend Analysis | "Analyse [data] for patterns. Identify: upward/downward trends, seasonal patterns, anomalies, and what might be causing them." | Findings with supporting evidence |
| Gap Analysis | "Compare our current state [describe] vs desired state [describe]. Identify each gap, its severity, and the effort required to close it." | Gap-by-gap breakdown with effort estimates |
ShopMate -- Monthly Review Report
# shopmate/reviews/monthly_report.py -- 200 reviews in, 3-minute report out import anthropic, json client = anthropic.Anthropic() def generate_review_report(reviews: list[dict]) -> dict: """Analyse all ThreadCo reviews for the month and extract actionable insights.""" reviews_text = " ".join( f"[{r['rating']}/5] {r['product']}: {r['text']}" for r in reviews ) resp = client.messages.create( model="claude-sonnet-4-6", max_tokens=1500, system="""You are analysing customer reviews for ThreadCo, a T-shirt brand. Produce an actionable monthly report for the founder. Focus on patterns, not individual reviews. Return ONLY valid JSON.""", messages=[{"role":"user","content": f"Analyse these {len(reviews)} reviews: {reviews_text} " 'Return JSON: {"top_complaints":["..."],"top_praises":["..."],' '"sizing_issues":["..."],"product_suggestions":["..."],' '"average_rating":0.0,"summary":"2-3 sentences for the founder"}' }] ) return json.loads(resp.content[0].text) # Sample reviews sample_reviews = [ {"rating":5,"product":"Sunset Gradient Tee","text":"Softest tee I own, the colour is exactly as shown"}, {"rating":2,"product":"Midnight Pocket Tee","text":"Runs small, ordered my usual size and it was tight"}, {"rating":4,"product":"Wave Print Crop Tee","text":"Love the print but shipping took 2 weeks"}, {"rating":3,"product":"Midnight Pocket Tee","text":"Nice quality but definitely size up"}, ] report = generate_review_report(sample_reviews) print(json.dumps(report, indent=2))
Tips for Better Reasoning Output
Give Claude All the Data
Reasoning quality is directly proportional to data quality. A vague question gets a vague analysis. Attach the actual data, not a description of the data. Include numbers, dates, names, and specific constraints. Claude cannot reason about information it does not have.
Ask for Confidence Levels
Add "For each conclusion, rate your confidence: high (strong evidence), medium (some evidence), or low (inference/assumption)." This forces Claude to distinguish between what it knows and what it is guessing, which is invaluable for decision-making.
Challenge the Reasoning
After Claude gives an analysis, push back: "What is the strongest argument against your conclusion?" or "What assumptions are you making that might be wrong?" Claude is good at adversarial self-review when explicitly asked.
Use Opus for High-Stakes Analysis
For decisions with significant consequences — pricing changes, architecture decisions, security reviews — use Opus with extended thinking. The cost difference is negligible for one-off analyses, and the quality difference on complex reasoning tasks is real and measurable.
Hands-On Exercises
Take a business decision your team is facing (or use: "Should ThreadCo offer free returns?"). Ask Claude the question directly, then ask it again with "Think through this step by step, considering at least 5 factors." Compare the depth and quality. Count the number of factors each approach considers.
Take a recent bug or production issue your team resolved. Give Claude the symptoms (error messages, logs, user reports) but NOT the solution. Ask Claude to identify the root cause. Compare Claude's analysis with the actual root cause your team found. How close was it? What additional information would have helped?
Identify a decision your team needs to make (tool selection, vendor choice, architecture approach). Define 5 criteria and ask Claude to build a weighted decision matrix. Then challenge Claude: "What criteria am I missing?" and "Argue the case for the option you ranked lowest." Does the analysis change?
Take a real business calculation (pricing analysis, inventory forecasting, or cost projection). Ask Claude to solve it twice: (a) reasoning directly in text, (b) writing a Python script. Compare the accuracy of the two approaches. For which types of calculations is the code approach significantly more accurate?
Take the most complex project your team is working on. Ask Claude to decompose it into sub-problems with dependencies. Review the decomposition: Did Claude identify sub-problems you hadn't considered? Are the dependency relationships correct? Use the decomposition as the basis for your actual project plan.