AI Agents vs Chatbots: What's Actually Different and When to Use Each
AI agents and chatbots solve different problems
The terms get used interchangeably in vendor marketing, which is unhelpful when you are trying to budget a project. A chatbot and an AI agent are different categories of system, with different architectures, different operating costs, and different failure modes. Picking the wrong one is how teams end up six months into a build that cannot do what the sponsor expected.
This article walks through the actual technical and operational differences, the conditions under which each makes sense, and the hybrid patterns that most production systems end up adopting. The framing is practical: what you are buying, what it costs to run, and what breaks.
The short definition that matters
A chatbot is a conversational interface. It takes a user message, produces a response, and the loop closes. Modern chatbots use large language models and retrieval (RAG) to ground answers in your documents, but the system is fundamentally reactive: user asks, bot answers, conversation continues. The bot does not take actions in the world beyond replying.
An AI agent is a system that can plan, use tools, and execute multi-step tasks toward a goal. It decides what to do next based on the state of the work, calls APIs or other systems, observes results, and iterates. The conversation is one possible input surface, but the defining feature is autonomous action over multiple steps.
The clean test: if the system's job ends when it sends a message back, it's a chatbot. If the system's job involves doing something - filing a ticket, updating a CRM record, processing an invoice, booking a meeting, querying three databases and reconciling the results - it's an agent.
Architecture: what's actually inside each one
A production chatbot, in the RAG era, typically has five components: an embedding model to index your knowledge base, a vector store (pgvector, Pinecone, Weaviate) for semantic search, a retriever that often combines vector and keyword search, an LLM that takes the retrieved context and the user question and produces an answer, and an evaluation harness to catch hallucinations and regressions. Anthropic's contextual retrieval research and OpenAI's RAG cookbook both document this pattern.
The control flow is linear. Question comes in, retriever pulls context, LLM generates response, response goes out. You can add memory (conversation history), guardrails (refusal patterns for off-topic or sensitive queries), and routing (different prompts for different intents), but the topology is essentially a pipeline.
An agent system has the same components plus three additions that change the engineering picture significantly. First, a tool layer: typed function definitions the model can call, each wrapping an API, database query, or internal action. Second, a planning loop: the model decides which tool to call, observes the result, decides the next step, and continues until the task is complete or it gives up. Third, state management: the agent needs to track what it has done, what it has learned, and what is still pending.
Anthropic's Building Effective Agents guide makes a useful distinction between workflows (predetermined sequences of LLM calls) and agents (where the model dynamically directs its own process). Many production "agent" systems are actually workflows with one or two agentic steps, and that is usually the right call - more on this below.
Cost and latency: the operational reality
Chatbots are cheap to run and fast to respond. A typical RAG query with a frontier model costs somewhere between $0.001 and $0.02 per turn depending on context size and model choice, and responds in 1-4 seconds. You can serve thousands of conversations per day on a budget that fits comfortably inside a single mid-market software line item.
Agents are an order of magnitude more expensive per task and much slower. A single agent run might involve 5-30 LLM calls as the model plans, reflects, and executes, plus the cost of the tools it calls. Tasks that take an agent 30 seconds to two minutes are common. A complex agent run can cost $0.50 to $5 or more.
This matters for sizing the business case. If you are replacing a Tier-1 support interaction that costs £4-£8 in agent time, a chatbot at 1p per turn is an obvious win. If you are automating an invoice reconciliation that takes a finance analyst 15 minutes at a fully-loaded cost of £8-£12, paying £1 for an agent to do it in 90 seconds is still a strong return - but you cannot run that agent on the same budget logic as a chatbot.
Failure modes are different
When a chatbot fails, it produces a wrong or unhelpful answer. The blast radius is one conversation. The user notices, they escalate to a human, or they leave. You mitigate with grounding (RAG), refusal patterns, evaluation suites, and human-in-the-loop for high-stakes queries.
When an agent fails, it can take wrong actions. It might update the wrong CRM record, send an email to the wrong client, run a database query that loads the connection pool, or get stuck in a loop calling the same tool 40 times. The blast radius is whatever its tools can touch. This is why agent deployments require:
- Permission scoping: the agent's tools should have the minimum access required. Read-only by default; write access gated behind explicit approval steps for anything destructive.
- Idempotency: tool calls should be safe to retry. The agent will, occasionally, call the same tool twice.
- Step limits and budgets: hard caps on tokens, tool calls, and wall-clock time per run.
- Human approval gates: for high-stakes actions (sending external communications, financial transactions, data deletion), the agent proposes and a human confirms.
- Observability: full traces of every decision, every tool call, every result. LangSmith, Langfuse, and similar tools exist because debugging an agent without traces is brutal.
The UK Information Commissioner's Office has published guidance on AI and data protection that is worth reading before deploying agents that touch personal data, particularly around the requirement for meaningful human oversight of automated decisions under Article 22.
When to choose a chatbot
A chatbot is the right answer when:
- The job is information retrieval. Customers asking about your product, employees asking about HR policy, partners asking about integration docs. RAG over a curated knowledge base solves this well.
- You need predictable cost and latency. Customer-facing interfaces need sub-5-second responses and predictable economics per conversation.
- The risk surface is low. A wrong answer is recoverable; a wrong action might not be.
- Volume is high. Tens of thousands of interactions a month make the per-turn cost math matter, and chatbots scale linearly with traffic.
- Your team is new to LLM systems. A chatbot is a tractable first project. You will learn retrieval, evaluation, prompt design, and guardrails - all of which you need before you build agents.
Typical chatbot deployments we see deliver value in 6-12 weeks from kickoff and pay back inside 6-9 months on Tier-1 support deflection alone. McKinsey's State of AI reports consistently show customer service as one of the highest-ROI early use cases.
When to choose an agent (or agentic workflow)
An agent makes sense when:
- The task requires multiple system calls in sequence. Looking up a customer in the CRM, checking their order in the ERP, generating a refund in the payment system, and updating the support ticket - this is agent territory.
- The next step depends on intermediate results. If the path is fixed, a workflow (n8n, Temporal, plain code) is simpler and more reliable. If the path branches based on what the system finds, agentic decision-making earns its cost.
- The task is high-value per execution. The cost per agent run only pencils out if the work being automated is worth meaningfully more than the run.
- You have observability and approval infrastructure. If you cannot watch what the agent is doing and intervene, do not deploy one.
A pragmatic note: most production systems described as "agents" are actually agentic workflows - mostly deterministic pipelines with one or two steps where the LLM is allowed to choose between options or call a tool. This is usually the right design. Pure open-ended agents that plan their own way through a task are appropriate for a narrow set of problems and tend to be harder to operate.
The hybrid pattern most teams end up with
In practice, the question "agent or chatbot" is often the wrong framing. The systems we deploy most often combine both:
- A chatbot interface handles the conversation, intent classification, and information requests.
- When the user asks for an action ("can you cancel my subscription", "update the address on order 4471"), the chatbot hands off to a narrow, single-purpose agent or workflow that executes the action.
- The agent reports back; the chatbot relays the result.
This decomposes the problem usefully. The chatbot stays cheap and fast for the 70-80% of interactions that are informational. The action-taking happens in scoped agents or deterministic workflows with proper guardrails. Each component is easier to evaluate and improve than a monolithic "do everything" agent.
The same pattern applies internally. A staff-facing assistant that answers policy questions is a chatbot. The same assistant, when asked to "raise a procurement request for X", calls a workflow that opens the ticket in the right system with the right fields. The LLM is doing intent recognition and parameter extraction; the action itself is deterministic code.
What to ask before you commit to either
Before scoping a build, get clear answers to:
- What is the outcome we are measuring? Deflected tickets, hours saved, conversion lift, cycle-time reduction. If you cannot name the metric, you are not ready to build.
- What is the value per successful interaction? This determines whether you can afford agent-level economics or need chatbot-level economics.
- What is the worst thing this system could do if it misbehaves? The honest answer determines how much guardrail engineering you need.
- Where does the data live? A chatbot needs a clean, retrievable knowledge base. An agent needs API access to the systems it acts on. Both fail without the underlying integrations.
- Who operates it after launch? Both chatbots and agents need ongoing evaluation, prompt updates, and incident response. Budget for it from day one.
FAQs
Is an AI agent just a chatbot with more features?
No, although the marketing often blurs them. A chatbot's job is to respond to messages; an agent's job is to complete tasks, which may or may not involve a conversational surface. The architectural difference is the planning loop and the tool layer: an agent decides what to do next based on intermediate results, calls external systems, and iterates. That distinction drives different cost profiles, different latency characteristics, and different failure modes. A chatbot with one or two function calls is sometimes called "agentic", but a true agent's defining feature is autonomous multi-step execution toward a goal it has been given.
How much does it cost to build each one?
For mid-market UK deployments, a production RAG chatbot with proper evaluation, guardrails, and a single channel typically lands between £25k and £60k for the initial build, with ongoing costs of £1.5k-£5k per month for hosting, model usage, and iteration. A production agent system is more variable: £50k-£150k for an initial build is typical, depending on how many tools and integrations it needs, and running costs range from £3k to £15k per month. The biggest cost driver for agents is integration work - connecting to the systems the agent acts on usually takes longer than the LLM engineering itself.
Can we start with a chatbot and add agent capabilities later?
Yes, and this is usually the right path. Building the chatbot first teaches your team retrieval, evaluation, prompt engineering, and guardrail design - all foundational to agent work. It also gives you a conversational surface that can later route to agentic workflows. The architecture should be designed with this evolution in mind: keep the intent classification layer modular, build your tool layer behind clean interfaces from the start (even if you initially have only one or two tools), and invest in observability infrastructure early. The progression from chatbot to chatbot-plus-workflows to chatbot-plus-agents is the most common production pattern we see.
What about GDPR and the ICO when deploying agents on customer data?
UK GDPR applies fully to both chatbots and agents that process personal data. The ICO's guidance on AI emphasises lawful basis, transparency, data minimisation, and meaningful human oversight, particularly under Article 22 for automated decisions with significant effects. For agents specifically, the risk profile is higher because they take actions: you need to document the decision logic, scope tool permissions tightly, log every action for accountability, and provide human approval gates for consequential decisions. Data processing agreements with model providers (OpenAI, Anthropic, Google) need review, and zero-retention API options should be used where personal data is in scope.
Do we need a vector database for an agent the way we do for a chatbot?
Not necessarily. Chatbots typically lean on RAG because their job is to answer questions grounded in your documents, which requires semantic retrieval. Agents may use retrieval as one tool among many, but their primary capability is calling APIs and executing actions, not fetching text. Some agents have no vector store at all - they query structured databases, call business systems, and operate over typed data. If your agent's job involves searching unstructured text (knowledge bases, contracts, emails), you'll want retrieval. If it operates over structured systems (CRM, ERP, ticketing), you may not need vectors at all.
What's the difference between an agent and a workflow automation tool like n8n?
A workflow automation tool executes a predetermined sequence of steps: when X happens, do A, then B, then C. The path is fixed by the person who built the workflow. An agent decides the sequence itself based on the task and intermediate results. The boundary blurs because modern workflow tools (n8n, Make, Zapier) now include LLM nodes that can branch or generate content, and modern agent frameworks often constrain agents to predetermined steps for reliability. Our default rule: if the logic can be expressed as a flowchart, use a workflow tool. If the logic genuinely requires the model to decide what to do next based on what it finds, use an agent - and even then, keep the agentic surface as narrow as possible.
Which is better for internal employee use cases?
Internal use cases are where agentic patterns earn their cost most readily. Employees can tolerate slightly slower responses, the risk profile is more controllable (you can scope tool permissions per-team), and the value per task is often high (hours of analyst or operations time saved). A common starting point is an internal chatbot that answers policy and process questions via RAG, then adds workflows for the top 5-10 "can you do X for me" requests employees actually make. Customer-facing deployments tend to favour chatbots first because of cost-per-interaction economics and the higher reputational risk of an action gone wrong.
How do we evaluate whether a chatbot or agent is actually working?
For chatbots, the standard evaluation stack includes: an offline test set of representative queries with expected behaviour (correct answer, correct refusal, correct escalation), automated scoring on accuracy and groundedness, online metrics (deflection rate, escalation rate, user satisfaction), and ongoing review of conversation logs to catch new failure modes. For agents, you add: task completion rate (did the agent finish what it started), tool-call accuracy (did it call the right tools with the right arguments), cost per successful task, and intervention rate (how often a human had to step in). Both need continuous evaluation, not a one-off check at launch. Budget 10-20% of build cost annually for evaluation and iteration.
Where to take this next
The choice between a chatbot and an agent is rarely binary in production. Start by being honest about what the system needs to do, not what it needs to be called, then pick the simplest architecture that delivers that outcome. Most teams overestimate how much autonomy they need and underestimate how much engineering goes into operating an agent safely.
If you are scoping a build and want a second opinion on whether your use case is a chatbot, a workflow, an agent, or a hybrid, AI Advisory runs a two-week strategy and readiness engagement that produces a costed roadmap and a clear architecture recommendation. Get in touch to discuss your project.
Ready to put this into production? book a discovery call.