Debugging | AI Training Hub

Windsurf Track

Module 29

Windsurf Track -- Module 29

Debugging a ShopMate Bug: Customers report that ShopMate is occasionally replying with the wrong store's return policy -- PetThreads policy showing up in ThreadCo chat sessions. The bug is intermittent and hard to reproduce. This module shows how Cascade's codebase awareness tracks down the tenant isolation bug in 20 minutes.

AI-Assisted Debugging Workflows

Debugging is where Windsurf's terminal integration and codebase awareness deliver the most dramatic productivity gains. Traditional debugging requires you to form a hypothesis, find the relevant code, set breakpoints, inspect state, and iterate. Cascade can compress this cycle from hours to minutes by reading errors, tracing call stacks, understanding your codebase structure, and proposing targeted fixes -- all while you watch and guide. This module covers systematic debugging flows, common error pattern recognition, log analysis, step-through debugging techniques, and strategies for tackling production issues.

The Four Debugging Modes

Error-First Debugging

Paste the full error and stack trace into a Flow prompt: "Fix this error -- include the full context of why it occurs." Cascade reads the stack trace, traverses the call chain across files, identifies the root cause, and proposes a fix with an explanation.

Test-Driven Debugging

Ask Cascade to write a failing test that reproduces the bug, then fix the implementation to make it pass. This creates a regression guard and forces the fix to be precise rather than masking the symptom.

Terminal-Loop Debugging

Start a Flow, let Cascade make changes, run the test suite in terminal, read the output, and iterate. Cascade can loop autonomously through run-fail-fix-run cycles until tests pass. You watch; you do not need to intervene.

Differential Debugging

Describe what changed and when the bug started: "This worked last week. Since we upgraded SQLModel to 0.0.14 the relationship loading breaks." Cascade narrows the search space dramatically with this context.

Error-First Debugging in Depth

The most common debugging workflow: something breaks, you get an error, and you need to fix it. Cascade excels here because it can read the entire stack trace and navigate your codebase to trace the root cause. The key is giving Cascade complete information.

The Complete Error Prompt

A good debugging prompt has four parts:

What you expected: "The /checkout endpoint should return a 200 with the order confirmation."
What actually happened: "It returns a 500 with an IntegrityError."
The full error output: Paste the complete stack trace, not a summary.
Recent changes: "We added the discount feature yesterday. The checkout endpoint was not modified."

Always Provide the Full Stack Trace

Truncated error messages dramatically reduce Cascade's ability to locate the root cause. Always paste the complete stack trace, including the outermost caller. If the trace is very long, include the first 20 lines and the last 20 lines -- the entry point and the failure point are what matter. Never summarise the error in your own words -- the exact error message contains information (line numbers, variable names, exception types) that Cascade uses to navigate directly to the problem.

Debugging Flow Template

Prompt Template -- Systematic Debug Flow

Bug report:
[Describe what you expected vs what actually happened]

Error output:
[Full error message and stack trace]

Steps to reproduce:
[Minimal reproduction steps]

Context:
[Recent changes, version upgrades, environment differences]

Task:
1. Identify the root cause -- explain your reasoning
2. Write a failing test that reproduces the bug
3. Fix the implementation to make the test pass
4. Verify no existing tests regress
5. Add a comment explaining why the bug occurred

Common Error Patterns and How to Debug Them

Certain error categories appear repeatedly in web development. Knowing how to prompt Cascade for each type accelerates debugging significantly:

Error Category	Typical Symptoms	Best Cascade Approach
Import/Module errors	ModuleNotFoundError, ImportError, circular imports	Paste the error. Ask Cascade to trace the import chain and identify the circular dependency or missing package.
Type errors	TypeError, AttributeError, wrong method signatures	Paste the error with the call site. Ask Cascade to check the type contract between caller and callee.
Database errors	IntegrityError, OperationalError, migration conflicts	Paste the error and ask Cascade to check the model definition against the migration history.
Async/concurrency bugs	Race conditions, deadlocks, "await" missing	Ask Cascade to trace the async call chain and identify where synchronous calls block the event loop.
API contract mismatches	422 Validation Error, unexpected response format	Paste the request payload and the error. Ask Cascade to compare the payload against the Pydantic model.
Environment/config bugs	Works locally, fails in CI/production	Ask Cascade to list all environment variables and config files the code depends on, then compare environments.
State management bugs	Stale data, wrong values, intermittent failures	Ask Cascade to trace where the state is set, read, and mutated. Look for shared mutable state.

Test-Driven Debugging in Detail

Test-driven debugging is the most reliable approach because it produces a regression guard along with the fix. The process:

Reproduce: Ask Cascade to write a test that triggers the exact bug. "Write a test that creates two concurrent orders for the same product with quantity=1. The test should fail because one order should be rejected (insufficient stock) but both currently succeed."
Verify reproduction: Run the test and confirm it fails with the expected error.
Fix: Ask Cascade to fix the implementation so the test passes.
Verify fix: Run the full test suite to ensure nothing else broke.
Document: The test itself is the documentation -- it describes the bug and proves it is fixed.

Text -- Test-Driven Debug: Race Condition

# Step 1: Write a test that reproduces the race condition

Write a test in tests/test_inventory.py that reproduces this bug:

Bug: Two simultaneous orders for the last item in stock both succeed,
     resulting in -1 inventory.

Test should:
1. Set product stock to 1
2. Use asyncio.gather to submit two concurrent purchase_product() calls
3. Assert that exactly one succeeds and one raises InsufficientStockError
4. Assert final stock is 0, not -1

Run the test -- it should FAIL (both orders currently succeed).

# Step 2: Fix the implementation

Fix shopmate/services/inventory.py:purchase_product() to prevent the
race condition. Use SELECT FOR UPDATE to lock the row during the
stock check and decrement.

Run: pytest tests/test_inventory.py -v to verify the fix.
Run: pytest tests/ to verify no regressions.

Terminal-Loop Debugging

Windsurf's most powerful debugging feature is the terminal loop. Cascade can autonomously:

Make a code change
Run a command in the terminal (test suite, build, linter)
Read the terminal output
Diagnose the failure
Apply a fix
Re-run the command
Repeat until success

This is especially powerful for debugging test failures. Instead of manually reading each error and telling Cascade what to fix, you can say: "Run pytest tests/test_checkout.py -v and fix any failures. Keep running until all tests pass." Cascade will loop through the failures autonomously, typically resolving multiple test failures in a single session.

Set a Loop Limit

For the terminal loop, add a safety limit: "Run the tests and fix failures. If you cannot resolve an issue after 3 attempts, stop and explain what you tried." This prevents Cascade from entering an infinite loop on a genuinely hard problem that requires human insight.

Log Analysis with Cascade

For bugs that do not produce clean stack traces -- intermittent failures, performance issues, data corruption -- log analysis is the primary debugging tool. Cascade can parse and interpret logs effectively if you structure the prompt correctly:

Text -- Log Analysis Flow

# Paste structured logs for Cascade to analyse

Here are the last 50 log lines from our ShopMate API during the time
window when customers reported wrong brand voice (14:20-14:35 UTC):

[paste log output here]

Analyse these logs for:
1. Any requests where brand_id does not match the expected brand for the session
2. Any errors or warnings around the time window
3. Any pattern in which requests succeed vs which get the wrong brand voice
4. Whether there is a correlation with specific customer IDs or session IDs

Based on your analysis, identify the likely root cause and suggest
which files to investigate.

Tips for effective log analysis with Cascade:

Use structured logging (JSON): Cascade parses structured logs far more accurately than unstructured text. If you are not using structured logging, the first step is to add it -- libraries like structlog (Python) or pino (Node.js) make this easy.
Include timestamps: Cascade can identify timing patterns (requests that take too long, events that happen out of order) when timestamps are present.
Include request IDs: Correlation IDs let Cascade trace a single request across multiple log entries, reconstructing the full request lifecycle.
Filter before pasting: Do not paste 10,000 log lines. Filter to the relevant time window and error level first. Cascade works best with 50-200 focused log lines.

Step-Through Debugging with Cascade

While Cascade cannot directly control a debugger (setting breakpoints, stepping through code), it can guide you through the process and interpret what you find:

Pattern 1: Cascade plans, you execute. Ask Chat: "I need to debug why the discount calculation is wrong. Which functions should I set breakpoints in, and what variable values should I check at each point?" Cascade gives you a debugging plan. You set the breakpoints, run the debugger, and report back what you find.

Pattern 2: Print-based debugging via Flow. For simpler issues, ask a Flow: "Add debug logging to the discount calculation pipeline. Log the input values, intermediate calculations, and final result at each step. Use structlog.debug() so we can remove them later." Run the code, paste the debug output into Chat, and ask Cascade to interpret it.

Text -- Guided Debugging Plan

# Ask Cascade for a debugging plan (Chat mode)

The discount calculation is returning $42.50 instead of $45.00
for a $50 item with a 10% discount.

Given @shopmate/pricing/discounts.py and @shopmate/pricing/tax.py:

1. What is the correct calculation order? (discount then tax, or tax then discount?)
2. Which functions are involved in this calculation path?
3. Where should I set breakpoints to find where the value diverges?
4. What variable values should I expect at each breakpoint if the calculation is correct?

# After Cascade gives you the plan, set breakpoints, run the debugger,
# and paste the actual variable values back into Chat for analysis.

Debugging Production Issues

Production debugging requires special care because you cannot set breakpoints, add debug logging easily, or reproduce freely. Here are Cascade-assisted strategies for production issues:

1. Log analysis (primary approach). Pull production logs for the affected time window and use the log analysis pattern above. This is almost always the starting point for production debugging.

2. Local reproduction. Ask Cascade: "Given this production error [paste], help me create a local reproduction. What test data do I need? What environment configuration should I match?" Cascade can often infer the conditions that trigger the bug from the error details.

3. Hypothesis testing. When you cannot reproduce locally, use Chat to develop hypotheses: "The error only happens between 2am and 4am UTC. What could be time-dependent in @src/services/cache.py? Could the cache TTL be expiring during low-traffic periods?" Then write tests for each hypothesis.

4. Safe production fixes. When you identify the bug, ask Cascade to generate the fix with extra safety constraints: "Fix this bug. The fix must be backward-compatible, must not require a database migration, and must be deployable without downtime. Include a feature flag so we can roll back instantly."

Never Debug Production by Deploying Experiments

It can be tempting to add debug logging to production "just to see what happens." Every production deployment carries risk. Instead, reproduce locally using production logs as your guide, or use observability tools (distributed tracing, error tracking services) that are already deployed. If you must add production logging, do it behind a feature flag and remove it promptly.

Differential Debugging: "It Worked Last Week"

When a bug appears after a change (library upgrade, new feature, configuration change), differential debugging narrows the search space dramatically. Give Cascade the "before" and "after" context:

Text -- Differential Debugging Flow

# When a bug appears after a known change

The product search endpoint stopped returning results for queries
with special characters (e.g., "men's tees") after we upgraded
SQLModel from 0.0.12 to 0.0.14.

Before (0.0.12): search("men's tees") returned 12 results
After (0.0.14): search("men's tees") returns 0 results
search("mens tees") still works fine -- the apostrophe is the problem

Files involved:
- @shopmate/services/search.py (the search function)
- @shopmate/repositories/product_repository.py (the DB query)

Task:
1. Check the SQLModel 0.0.14 changelog for breaking changes in string handling
2. Identify which line in our code is affected
3. Write a test that fails with the current code
4. Fix it to handle special characters correctly
5. Run: pytest tests/test_search.py -v

ShopMate -- Debug the Tenant Isolation Bug

Text -- Debug Flow: Wrong Brand Voice

# The bug: PetThreads product descriptions are using ThreadCo's voice.
# Paste this into Cascade (Cmd+I):

BUG: ShopMate is using ThreadCo's brand voice when generating descriptions
for PetThreads. Customers are complaining the PetThreads site sounds too serious.

Evidence:
- describe_for_brand("petthreads", product) returns a ThreadCo-style description
- No exclamation marks, no playful language -- should have both per brands.yaml
- ThreadCo descriptions seem fine

Reproduction:
  from shopmate.multi_brand import describe_for_brand
  product = {"name": "Paw Print Tee", "material": "organic cotton", "price": 27.99}
  result = describe_for_brand("petthreads", product)
  # Expected: playful, pet-focused, exclamation marks OK
  # Got: serious ThreadCo-style copy

Task:
1. Find the bug in @shopmate/multi_brand.py and @shopmate/config/brands.yaml
2. Write a failing test in tests/test_multi_brand.py that checks brand voice isolation
3. Fix the bug
4. Run: pytest tests/test_multi_brand.py -v to verify

Building a Debugging Checklist

Experienced debuggers follow a systematic process. Here is a checklist you can use with Cascade for any bug:

Gather evidence: Collect the error message, stack trace, logs, and reproduction steps. Do not start debugging with incomplete information.
Reproduce: Write a test or script that triggers the bug locally. If you cannot reproduce it, focus on log analysis first.
Isolate: Narrow down to the specific file and function where the bug occurs. Use Chat to trace the call chain if needed.
Understand: Before fixing, understand why the bug exists. Ask Chat: "Why does this code behave this way?" The understanding prevents you from applying a patch that masks the symptom.
Fix: Apply the minimal fix that addresses the root cause. Avoid large refactoring as part of a bug fix -- that should be a separate Flow.
Verify: Run the reproduction test and the full test suite. Both must pass.
Prevent: Add the reproduction test as a permanent regression guard. Update .windsurfrules if the bug was caused by a pattern that could recur.

Hands-On Exercises

Exercise 1: Error-First Debug

Find a real error in one of your projects (check your test suite -- there may be a failing test, or run the app and trigger an error). Paste the complete error and stack trace into a Windsurf Flow using the debug template. Let Cascade trace the root cause and propose a fix. Did it find the correct root cause? Was the fix correct? How does this compare to your usual debugging time for a similar issue?

Exercise 2: Test-Driven Bug Fix

Introduce a deliberate bug into a project (e.g., change a >= to a > in a boundary condition). Then, without telling Cascade what the bug is, give it the symptoms: "The pagination returns 9 items instead of 10 when the total is exactly divisible by the page size." Ask Cascade to write a failing test that reproduces this, then fix it. Did Cascade find your deliberately introduced bug?

Exercise 3: Terminal Loop Debugging

Find a test file with 3+ failing tests. Start a Flow: "Run pytest [file] -v and fix any failures. Keep running until all tests pass. If you cannot resolve an issue after 3 attempts, stop and explain." Watch Cascade work through the failures autonomously. How many did it fix without your intervention? Where did it get stuck?

Exercise 4: Log Analysis

Capture 50-100 lines of structured log output from your application during normal operation. Paste them into Chat and ask: "Analyse these logs for any anomalies, warnings, or patterns that suggest potential issues." Compare Cascade's findings to your own reading of the logs. Did it spot anything you missed? Were there false positives?

Exercise 5: Debugging Plan

Choose a complex bug you are currently facing (or have recently fixed). Use Chat to ask Cascade: "Here are the symptoms of a bug. Give me a step-by-step debugging plan: which files to investigate, where to set breakpoints, what variable values to check." Follow the plan. How does Cascade's systematic approach compare to your intuition-driven debugging? Did the plan help you find the issue faster?

<-- Context and Memory Next: Advanced Techniques -->