Agents & Tools

AI Foundations

Module 04

Foundation

The Order Assistant Agent: ThreadCo's most-requested feature: a chatbot that looks up an order number, checks its status, and gives the customer a real answer -- without staff involvement. This requires an agent that can query the order database as a tool. This module builds the pattern.

An AI agent is an LLM given the ability to take actions -- calling tools, reading files, browsing the web, executing code -- in pursuit of a goal. Agents represent the shift from AI as a conversational assistant to AI as an autonomous worker.

Chatbot vs Agent: What Makes the Difference

The terms "chatbot" and "agent" are often used interchangeably, but they describe fundamentally different architectures with different capabilities and risks.

Chatbot

Input: User message

Processing: LLM generates a text response from its training data and the conversation history

Output: Text only -- no side effects

Capabilities: Answer questions, write content, brainstorm, summarise -- all from memory and context

Limitations: Cannot look up real-time data, cannot take actions, cannot verify its own claims

Risk profile: Low -- worst case is a wrong or unhelpful answer

Agent

Input: User goal (potentially multi-step)

Processing: LLM reasons about the goal, decides which tools to call, executes them, interprets results, and iterates

Output: Text + real-world side effects (database writes, API calls, file changes, emails sent)

Capabilities: Everything a chatbot can do, plus: query databases, search the web, execute code, manage files, call APIs

Limitations: Only as good as its tools, reasoning, and guardrails

Risk profile: Higher -- can take irreversible actions if poorly designed

The Four Components of an Agent

1. Reasoning Model (LLM)

The "brain" of the agent. The LLM decides what to do next based on the user's goal, the current context, and the results of previous actions. More capable models (Opus, GPT-4o) make better agents because they reason more accurately about when and how to use tools. The model does not execute tools directly -- it emits structured instructions that your application interprets.

2. Tool Set

The actions the agent can take. Each tool is defined by a name, description, and parameter schema. The model reads these descriptions to decide which tool to call. Examples: search_orders(order_id), send_email(to, subject, body), run_sql(query). The quality of tool descriptions directly impacts how well the agent uses them -- vague descriptions lead to wrong tool calls.

3. Memory System

Agents need memory to track progress across multiple steps. Short-term memory is the context window -- the conversation so far, including tool results. Long-term memory is an optional external store (database, vector store) that persists across conversations. Without adequate memory, agents repeat themselves, forget instructions, or lose track of multi-step tasks.

4. Execution Loop

The orchestration code that connects everything. It sends the prompt to the model, parses the response for tool calls, executes the tools, feeds results back to the model, and repeats. The loop runs until the model emits a final text response with no tool calls, or until a stop condition is met (max iterations, timeout, error). Your application code controls this loop, not the model.

The Agentic Loop in Detail

Understanding the execution flow is essential for debugging and designing reliable agents.

Step	Actor	Action	Example (ThreadCo)
1	User	Sends a goal or question	"Where is my order #TC-4521?"
2	LLM	Reasons about the goal and decides to call a tool	Decides to call `search_orders(order_id="TC-4521")`
3	Application	Executes the tool and returns the result	Returns: {status: "shipped", tracking: "1Z999...", eta: "Apr 17"}
4	LLM	Reads the tool result and decides if more tools are needed	Has enough info -- no more tools needed
5	LLM	Generates a final text response	"Your order #TC-4521 has shipped! Tracking: 1Z999... Expected delivery: April 17."

Multi-Step Example

More complex goals require multiple loop iterations. If the customer asks "I want to return order #TC-4521 and get a refund," the agent might: (1) call search_orders to get order details, (2) call check_return_eligibility to verify the return window, (3) call create_return_label to generate a shipping label, (4) compose a response with the label and instructions. Each tool call is a separate iteration of the loop.

Tool Use Patterns

Not all agents use tools the same way. These patterns represent increasing levels of sophistication.

Pattern: Direct Tool Call

The simplest pattern. The model maps the user's request directly to a single tool call. Example: "What's the weather in London?" maps to get_weather(city="London"). No reasoning chain needed -- one tool, one result, one response. Good for simple, well-defined tasks.

Pattern: Sequential Tool Chain

The model calls multiple tools in sequence, where each tool's output informs the next call. Example: search for a customer by email, then look up their order history, then check the latest order's shipping status. The model plans the sequence based on what information it needs and what each tool provides.

Pattern: Parallel Tool Calls

When the model needs independent pieces of information, it can request multiple tool calls simultaneously. Example: looking up order status AND checking inventory for a replacement item at the same time. This reduces latency. Most modern APIs support returning multiple tool_use blocks in a single response.

Pattern: Conditional Branching

The model chooses different tools based on intermediate results. If the order status is "delivered," call check_satisfaction. If it's "delayed," call get_new_eta. If it's "cancelled," call initiate_refund. The model's reasoning ability determines which branch to take -- this is where model quality matters most.

The ReAct Pattern: Reasoning + Acting

ReAct (Reason + Act) is the most widely used agent architecture. The model alternates between reasoning about what to do and taking actions via tools.

ReAct Flow

Thought: "The user wants to know their order status. I need to look up order #TC-4521 in the database."

Action: search_orders(order_id="TC-4521")

Observation: {status: "shipped", carrier: "UPS", tracking: "1Z999...", shipped_date: "2026-04-13"}

Thought: "The order has shipped. I have the tracking number and carrier. I can now give the customer a complete answer."

Answer: "Great news! Your order shipped on April 13 via UPS. Track it here: ..."

The key insight of ReAct is that by making the model's reasoning visible (the "Thought" steps), you can debug why the agent made certain decisions. In production, you typically log these thoughts for monitoring and auditing, even if you don't show them to the end user.

Planning and Multi-Step Reasoning

Advanced agents can plan ahead, decomposing complex goals into sub-tasks before executing them.

Planning Style	Description	When to Use
No planning (reactive)	Agent decides one step at a time based on current state	Simple tasks with 1-3 steps; low risk of going off track
Upfront planning	Agent creates a full plan before executing any steps	Complex tasks where you want to review the plan before execution begins
Adaptive planning	Agent creates an initial plan, then revises it as new information arrives	Tasks where requirements evolve based on intermediate results
Hierarchical planning	A "manager" agent decomposes the task, delegates sub-tasks to "worker" agents	Very complex tasks requiring different expertise (e.g., research + code + writing)

Real-World Agent Examples

Customer Service Agent

Tools: Order lookup, refund processing, FAQ search, escalation to human. Example: ThreadCo's ShopMate handles 80% of customer inquiries without human involvement. It queries the order database, checks return policies, generates return labels, and escalates complex issues to staff. Monthly cost: $50. Hours saved: 60/month.

Coding Agent

Tools: File read/write, terminal execution, web search, git operations. Example: Claude Code (what you may be using right now) reads your codebase, understands the architecture, writes code, runs tests, and iterates until the tests pass. It plans multi-file changes, handles dependencies, and explains its reasoning.

Research Agent

Tools: Web search, document retrieval, citation verification, summarisation. Example: A legal research agent searches case law databases, identifies relevant precedents, extracts key holdings, checks citation accuracy, and produces a structured brief. Reduces research time from hours to minutes.

Data Analysis Agent

Tools: SQL execution, Python code execution, chart generation, report writing. Example: An analyst asks "What were our top 10 products by revenue last quarter?" The agent writes and executes SQL, processes the results in Python, generates a chart, and writes a narrative summary -- all in under a minute.

Agent Safety and Guardrails

Minimal Footprint Principle

Always give agents the minimum permissions needed to complete the task. An agent that can read files does not need write access. An agent that queries a database does not need to be able to delete rows. Scope is everything.

Human-in-the-Loop for Irreversible Actions

For any action that is irreversible -- sending emails, modifying databases, deploying code, processing payments -- require human confirmation before execution. Agents with unchecked write access are a significant operational risk. Implement approval gates: the agent proposes the action, a human confirms, and only then does the system execute it.

Guardrail	What It Prevents	Implementation
Tool allowlisting	Agent calling tools it should not have access to	Only register the specific tools the agent needs
Parameter validation	Malformed or dangerous tool inputs	Validate all tool parameters before execution (types, ranges, allowlists)
Rate limiting	Runaway loops that burn through API budget	Set max iterations per request and max spend per conversation
Output filtering	Sensitive data leaking to users	Scan tool results and agent responses for PII, credentials, etc.
Audit logging	Inability to investigate incidents	Log every tool call, parameter, result, and reasoning step
Timeout controls	Agents that run forever on ambiguous tasks	Set hard timeouts on individual tool calls and total agent run time

Common Agent Failure Modes

Understanding how agents fail is as important as understanding how they work. These are the most frequent failure modes in production agent systems.

Failure Mode	Description	Prevention
Infinite loops	Agent repeats the same tool calls without making progress	Set maximum iteration limits; detect repeated identical tool calls
Wrong tool selection	Agent calls a tool that does not match the user's intent	Write precise tool descriptions; test with diverse inputs; add fallback logic
Parameter hallucination	Agent invents plausible but incorrect parameter values	Validate parameters against known ranges; require explicit user confirmation for critical values
Goal drift	Agent loses track of the original goal during multi-step execution	Include the original goal in every prompt; implement progress tracking
Error cascading	A tool error in step 2 causes incorrect reasoning in steps 3-5	Implement error handling that resets or retries failed steps; validate intermediate results
Context overflow	Tool results fill the context window, causing the agent to lose earlier instructions	Summarise long tool results; implement context management strategies

The Cost of Agent Failures

Unlike a chatbot that gives a wrong answer (annoying but recoverable), an agent failure can have real-world consequences: incorrect data written to a database, a wrong email sent to a customer, or an expensive runaway API loop. This is why guardrails, human oversight, and thorough testing are non-negotiable for production agents. Always ask: "What is the worst thing that could happen if this agent makes a mistake?"

When to Use Agents (and When Not To)

Use an Agent When

The task requires accessing external data or systems at runtime

The task involves multiple steps with conditional branching

The user's request cannot be fully specified upfront

Real-time information is needed (order status, live data, etc.)

The task requires executing code or making API calls

Use a Simple Prompt When

All information is already in the prompt (no external lookups needed)

The task is a single transformation (summarise, translate, classify)

No side effects are needed (just generate text)

Latency is critical (agent loops add time)

The cost of errors is low and human review is not needed

Hands-On Exercises

Exercise 1: Design an Agent

Choose a repetitive task from your work. Design an agent on paper: (a) What is the user's goal? (b) What tools does the agent need? (For each tool, write a name, description, and parameters.) (c) Walk through a typical interaction step by step using the ReAct format (Thought, Action, Observation). (d) What could go wrong? List three failure modes and how you would handle each.

Exercise 2: Chatbot vs Agent Analysis

For each scenario, decide whether a chatbot or agent is needed, and justify your answer: (a) Answering FAQs about company policies. (b) Looking up a customer's account balance and recent transactions. (c) Writing a blog post about industry trends. (d) Scheduling a meeting by checking three people's calendars. (e) Generating a monthly sales report from a database.

Exercise 3: Tool Description Challenge

Write tool descriptions for these three functions that would help an LLM use them correctly: (a) A function that searches a product catalog by keyword, category, and price range. (b) A function that sends an email (to, subject, body). (c) A function that executes a read-only SQL query against a database. For each, write the tool name, a clear description (2-3 sentences), and the parameter schema. Then test: would an LLM know when to use each tool based on your descriptions alone?

Exercise 4: Guardrail Design

ThreadCo's ShopMate agent has these tools: search_orders, create_return_label, send_email, update_customer_record. Design the guardrails: (a) Which tools should require human approval before execution? (b) What parameter validations would you add? (c) What rate limits would you set? (d) What should happen if the agent tries to call a tool with invalid parameters? Write a one-page guardrail specification.

Exercise 5: Trace an Agent Interaction

If you have access to Claude Code, Cursor, or another agentic AI tool, give it a multi-step task (e.g., "Find all TODO comments in this codebase and create a summary"). Observe the steps it takes. Write a trace in ReAct format: for each step, record the Thought (what it decided to do), Action (what tool it called), and Observation (what result it got). How many iterations did it take? Were any steps unnecessary?

<-- Prompting Principles Next: Safety & Ethics -->