Agents & Tools
An AI agent is an LLM given the ability to take actions -- calling tools, reading files, browsing the web, executing code -- in pursuit of a goal. Agents represent the shift from AI as a conversational assistant to AI as an autonomous worker.
Chatbot vs Agent: What Makes the Difference
The terms "chatbot" and "agent" are often used interchangeably, but they describe fundamentally different architectures with different capabilities and risks.
Input: User message
Processing: LLM generates a text response from its training data and the conversation history
Output: Text only -- no side effects
Capabilities: Answer questions, write content, brainstorm, summarise -- all from memory and context
Limitations: Cannot look up real-time data, cannot take actions, cannot verify its own claims
Risk profile: Low -- worst case is a wrong or unhelpful answer
Input: User goal (potentially multi-step)
Processing: LLM reasons about the goal, decides which tools to call, executes them, interprets results, and iterates
Output: Text + real-world side effects (database writes, API calls, file changes, emails sent)
Capabilities: Everything a chatbot can do, plus: query databases, search the web, execute code, manage files, call APIs
Limitations: Only as good as its tools, reasoning, and guardrails
Risk profile: Higher -- can take irreversible actions if poorly designed
The Four Components of an Agent
1. Reasoning Model (LLM)
The "brain" of the agent. The LLM decides what to do next based on the user's goal, the current context, and the results of previous actions. More capable models (Opus, GPT-4o) make better agents because they reason more accurately about when and how to use tools. The model does not execute tools directly -- it emits structured instructions that your application interprets.
2. Tool Set
The actions the agent can take. Each tool is defined by a name, description, and parameter schema. The model reads these descriptions to decide which tool to call. Examples: search_orders(order_id), send_email(to, subject, body), run_sql(query). The quality of tool descriptions directly impacts how well the agent uses them -- vague descriptions lead to wrong tool calls.
3. Memory System
Agents need memory to track progress across multiple steps. Short-term memory is the context window -- the conversation so far, including tool results. Long-term memory is an optional external store (database, vector store) that persists across conversations. Without adequate memory, agents repeat themselves, forget instructions, or lose track of multi-step tasks.
4. Execution Loop
The orchestration code that connects everything. It sends the prompt to the model, parses the response for tool calls, executes the tools, feeds results back to the model, and repeats. The loop runs until the model emits a final text response with no tool calls, or until a stop condition is met (max iterations, timeout, error). Your application code controls this loop, not the model.
The Agentic Loop in Detail
Understanding the execution flow is essential for debugging and designing reliable agents.
| Step | Actor | Action | Example (ThreadCo) |
|---|---|---|---|
| 1 | User | Sends a goal or question | "Where is my order #TC-4521?" |
| 2 | LLM | Reasons about the goal and decides to call a tool | Decides to call search_orders(order_id="TC-4521") |
| 3 | Application | Executes the tool and returns the result | Returns: {status: "shipped", tracking: "1Z999...", eta: "Apr 17"} |
| 4 | LLM | Reads the tool result and decides if more tools are needed | Has enough info -- no more tools needed |
| 5 | LLM | Generates a final text response | "Your order #TC-4521 has shipped! Tracking: 1Z999... Expected delivery: April 17." |
More complex goals require multiple loop iterations. If the customer asks "I want to return order #TC-4521 and get a refund," the agent might: (1) call search_orders to get order details, (2) call check_return_eligibility to verify the return window, (3) call create_return_label to generate a shipping label, (4) compose a response with the label and instructions. Each tool call is a separate iteration of the loop.
Tool Use Patterns
Not all agents use tools the same way. These patterns represent increasing levels of sophistication.
Pattern: Direct Tool Call
The simplest pattern. The model maps the user's request directly to a single tool call. Example: "What's the weather in London?" maps to get_weather(city="London"). No reasoning chain needed -- one tool, one result, one response. Good for simple, well-defined tasks.
Pattern: Sequential Tool Chain
The model calls multiple tools in sequence, where each tool's output informs the next call. Example: search for a customer by email, then look up their order history, then check the latest order's shipping status. The model plans the sequence based on what information it needs and what each tool provides.
Pattern: Parallel Tool Calls
When the model needs independent pieces of information, it can request multiple tool calls simultaneously. Example: looking up order status AND checking inventory for a replacement item at the same time. This reduces latency. Most modern APIs support returning multiple tool_use blocks in a single response.
Pattern: Conditional Branching
The model chooses different tools based on intermediate results. If the order status is "delivered," call check_satisfaction. If it's "delayed," call get_new_eta. If it's "cancelled," call initiate_refund. The model's reasoning ability determines which branch to take -- this is where model quality matters most.
The ReAct Pattern: Reasoning + Acting
ReAct (Reason + Act) is the most widely used agent architecture. The model alternates between reasoning about what to do and taking actions via tools.
Thought: "The user wants to know their order status. I need to look up order #TC-4521 in the database."
Action: search_orders(order_id="TC-4521")
Observation: {status: "shipped", carrier: "UPS", tracking: "1Z999...", shipped_date: "2026-04-13"}
Thought: "The order has shipped. I have the tracking number and carrier. I can now give the customer a complete answer."
Answer: "Great news! Your order shipped on April 13 via UPS. Track it here: ..."
The key insight of ReAct is that by making the model's reasoning visible (the "Thought" steps), you can debug why the agent made certain decisions. In production, you typically log these thoughts for monitoring and auditing, even if you don't show them to the end user.
Planning and Multi-Step Reasoning
Advanced agents can plan ahead, decomposing complex goals into sub-tasks before executing them.
| Planning Style | Description | When to Use |
|---|---|---|
| No planning (reactive) | Agent decides one step at a time based on current state | Simple tasks with 1-3 steps; low risk of going off track |
| Upfront planning | Agent creates a full plan before executing any steps | Complex tasks where you want to review the plan before execution begins |
| Adaptive planning | Agent creates an initial plan, then revises it as new information arrives | Tasks where requirements evolve based on intermediate results |
| Hierarchical planning | A "manager" agent decomposes the task, delegates sub-tasks to "worker" agents | Very complex tasks requiring different expertise (e.g., research + code + writing) |
Real-World Agent Examples
Customer Service Agent
Tools: Order lookup, refund processing, FAQ search, escalation to human. Example: ThreadCo's ShopMate handles 80% of customer inquiries without human involvement. It queries the order database, checks return policies, generates return labels, and escalates complex issues to staff. Monthly cost: $50. Hours saved: 60/month.
Coding Agent
Tools: File read/write, terminal execution, web search, git operations. Example: Claude Code (what you may be using right now) reads your codebase, understands the architecture, writes code, runs tests, and iterates until the tests pass. It plans multi-file changes, handles dependencies, and explains its reasoning.
Research Agent
Tools: Web search, document retrieval, citation verification, summarisation. Example: A legal research agent searches case law databases, identifies relevant precedents, extracts key holdings, checks citation accuracy, and produces a structured brief. Reduces research time from hours to minutes.
Data Analysis Agent
Tools: SQL execution, Python code execution, chart generation, report writing. Example: An analyst asks "What were our top 10 products by revenue last quarter?" The agent writes and executes SQL, processes the results in Python, generates a chart, and writes a narrative summary -- all in under a minute.
Agent Safety and Guardrails
Always give agents the minimum permissions needed to complete the task. An agent that can read files does not need write access. An agent that queries a database does not need to be able to delete rows. Scope is everything.
For any action that is irreversible -- sending emails, modifying databases, deploying code, processing payments -- require human confirmation before execution. Agents with unchecked write access are a significant operational risk. Implement approval gates: the agent proposes the action, a human confirms, and only then does the system execute it.
| Guardrail | What It Prevents | Implementation |
|---|---|---|
| Tool allowlisting | Agent calling tools it should not have access to | Only register the specific tools the agent needs |
| Parameter validation | Malformed or dangerous tool inputs | Validate all tool parameters before execution (types, ranges, allowlists) |
| Rate limiting | Runaway loops that burn through API budget | Set max iterations per request and max spend per conversation |
| Output filtering | Sensitive data leaking to users | Scan tool results and agent responses for PII, credentials, etc. |
| Audit logging | Inability to investigate incidents | Log every tool call, parameter, result, and reasoning step |
| Timeout controls | Agents that run forever on ambiguous tasks | Set hard timeouts on individual tool calls and total agent run time |
Common Agent Failure Modes
Understanding how agents fail is as important as understanding how they work. These are the most frequent failure modes in production agent systems.
| Failure Mode | Description | Prevention |
|---|---|---|
| Infinite loops | Agent repeats the same tool calls without making progress | Set maximum iteration limits; detect repeated identical tool calls |
| Wrong tool selection | Agent calls a tool that does not match the user's intent | Write precise tool descriptions; test with diverse inputs; add fallback logic |
| Parameter hallucination | Agent invents plausible but incorrect parameter values | Validate parameters against known ranges; require explicit user confirmation for critical values |
| Goal drift | Agent loses track of the original goal during multi-step execution | Include the original goal in every prompt; implement progress tracking |
| Error cascading | A tool error in step 2 causes incorrect reasoning in steps 3-5 | Implement error handling that resets or retries failed steps; validate intermediate results |
| Context overflow | Tool results fill the context window, causing the agent to lose earlier instructions | Summarise long tool results; implement context management strategies |
Unlike a chatbot that gives a wrong answer (annoying but recoverable), an agent failure can have real-world consequences: incorrect data written to a database, a wrong email sent to a customer, or an expensive runaway API loop. This is why guardrails, human oversight, and thorough testing are non-negotiable for production agents. Always ask: "What is the worst thing that could happen if this agent makes a mistake?"
When to Use Agents (and When Not To)
The task requires accessing external data or systems at runtime
The task involves multiple steps with conditional branching
The user's request cannot be fully specified upfront
Real-time information is needed (order status, live data, etc.)
The task requires executing code or making API calls
All information is already in the prompt (no external lookups needed)
The task is a single transformation (summarise, translate, classify)
No side effects are needed (just generate text)
Latency is critical (agent loops add time)
The cost of errors is low and human review is not needed
Hands-On Exercises
Choose a repetitive task from your work. Design an agent on paper: (a) What is the user's goal? (b) What tools does the agent need? (For each tool, write a name, description, and parameters.) (c) Walk through a typical interaction step by step using the ReAct format (Thought, Action, Observation). (d) What could go wrong? List three failure modes and how you would handle each.
For each scenario, decide whether a chatbot or agent is needed, and justify your answer: (a) Answering FAQs about company policies. (b) Looking up a customer's account balance and recent transactions. (c) Writing a blog post about industry trends. (d) Scheduling a meeting by checking three people's calendars. (e) Generating a monthly sales report from a database.
Write tool descriptions for these three functions that would help an LLM use them correctly: (a) A function that searches a product catalog by keyword, category, and price range. (b) A function that sends an email (to, subject, body). (c) A function that executes a read-only SQL query against a database. For each, write the tool name, a clear description (2-3 sentences), and the parameter schema. Then test: would an LLM know when to use each tool based on your descriptions alone?
ThreadCo's ShopMate agent has these tools: search_orders, create_return_label, send_email, update_customer_record. Design the guardrails: (a) Which tools should require human approval before execution? (b) What parameter validations would you add? (c) What rate limits would you set? (d) What should happen if the agent tries to call a tool with invalid parameters? Write a one-page guardrail specification.
If you have access to Claude Code, Cursor, or another agentic AI tool, give it a multi-step task (e.g., "Find all TODO comments in this codebase and create a summary"). Observe the steps it takes. Write a trace in ReAct format: for each step, record the Thought (what it decided to do), Action (what tool it called), and Observation (what result it got). How many iterations did it take? Were any steps unnecessary?