AI Workflow Agency
AI5 min read

When to Build Custom AI: A Decision Framework

A practical framework for deciding when to build custom AI versus buying off-the-shelf

By AI Advisory team

Every quarter, a head of operations or CTO somewhere decides their business needs custom AI. Six months later, half of them wish they had bought something off the shelf, and a quarter of the buyers wish they had built. The decision is genuinely hard because the trade-offs shift quickly: foundation models get cheaper, SaaS vendors add features overnight, and your own data stack matures in ways that change what is feasible.

This article gives you a working framework for the build-versus-buy decision on AI systems specifically. It is not a generic IT procurement checklist. The economics, risks, and capability requirements of AI are different enough to warrant their own logic.

The default should be buy, and here is why

If you have no strong reason to build, buy. This is the opposite of how most engineering-led teams instinctively approach the question, but it is the right starting position for AI in 2026.

The reasons are structural. Foundation model capabilities are commoditising at the API layer. OpenAI, Anthropic, Google, and the open-weights ecosystem (Llama, Mistral, Qwen) are leapfrogging each other every few months. A custom system built on a specific model in Q1 may be technically inferior to a SaaS product running on a newer model by Q4. The SaaS vendor absorbs that upgrade cost; you do not.

The second reason is operational. Custom AI systems need evaluation harnesses, prompt regression testing, drift monitoring, cost controls, fallback logic, and on-call coverage when models change behaviour. According to Gartner's October 2024 forecast, at least 30% of generative AI projects will be abandoned after proof of concept by end of 2025, with poor data quality, inadequate risk controls, and unclear business value the primary causes. Most of that 30% is custom build work that underestimated the operational tail.

The third reason is talent. People who can build production-grade AI systems and stay on top of the model ecosystem are scarce and expensive. UK senior ML engineer salaries sit comfortably above £100k base, and contractor rates for capable AI engineers run £700-£1,200 per day. If you can buy a tool that solves 80% of the problem for £30k per year, the build case has to clear a high bar.

So the default is buy. The rest of this article is about the conditions under which that default flips.

The four tests that justify a custom build

A custom AI build is justified when at least one of these four conditions is true. If none are true, buy. If one is true, consider build but stress-test it. If two or more are true, build is probably correct.

Test 1: The capability does not exist in the market

The simplest case. You have looked at the SaaS landscape, run trials with the two or three credible vendors, and concluded that none of them does what you need. Not "does it 70% of the way" - actually does not do it.

This is rarer than buyers initially think. The honest version of this test asks: have I demoed three products and run a two-week pilot with the best of them? If not, you are not finished evaluating. Vendors hide capability behind sales gates; the product you dismissed from the website often has the feature you need on the enterprise tier.

Genuine examples of "capability does not exist": a legal firm needs a system that reasons over their proprietary precedent database in a specific way that no legal AI vendor supports, with audit trails the regulators require. A manufacturer needs an AI scheduler that integrates with a 1990s SAP module via SOAP and adjusts for shift patterns no SaaS vendor models.

Test 2: The data or workflow is your competitive moat

If the AI system processes data that is genuinely proprietary and constitutes part of your competitive advantage, building gives you control over how that data is used, retained, and surfaced. Buying often means sending that data through a vendor's pipeline, even if contractually they cannot train on it.

The test here is sharper than "our data is sensitive." All data is sensitive at some level. The real question is: does the way we apply AI to this data shape what we sell? If yes, build. If the AI is internal productivity (HR queries, IT helpdesk, meeting summaries), it almost certainly does not meet this test, and a vendor is fine.

Test 3: The integration surface is too custom for SaaS to fit

Many mid-market businesses run on a stack that no SaaS AI vendor was designed for: a bespoke ERP from 2008, a CRM with thirty custom objects, a warehouse running on AS/400, a workflow tool that the founder built in PHP. Off-the-shelf AI assumes Salesforce, HubSpot, NetSuite, Zendesk, and modern Postgres. When your stack does not match, the integration cost of buying can exceed the cost of building.

This test should be quantified, not asserted. Get a real integration quote from the SaaS vendor (or their preferred SI partner). Compare it to a custom build estimate. If the integration alone is £150k and the SaaS licence is another £60k per year, and a custom build comes in at £120k with £40k annual operating cost, build wins on five-year TCO.

Test 4: Volume and unit economics

SaaS AI is usually priced per-seat, per-conversation, per-document, or per-token markup. At scale, those unit economics get punishing. A chatbot vendor charging £0.50 per resolved conversation looks cheap until you hit 200,000 conversations per month and the bill is £100k. The same workload on direct foundation model APIs with a custom wrapper might cost £8-15k per month, plus engineering overhead.

The threshold is usually somewhere between £100k and £250k per year in SaaS spend on a single AI workload. Below that, building rarely pays back the engineering and operational cost. Above it, the maths starts to favour custom, especially if usage is still growing.

Hybrid is usually the right answer

The build-versus-buy framing is a useful simplification, but real architectures are hybrid. Most mid-market AI systems should be:

  • Buy the model. Use OpenAI, Anthropic, or open-weights via Bedrock, Vertex, or a hosted inference provider. Do not train your own foundation model. Do not fine-tune unless you have run RAG first and proved it insufficient.
  • Buy the platform components that are commoditised. Vector databases (Pinecone, Weaviate, or pgvector), observability (Langfuse, Helicone), evaluation tooling (Braintrust, Promptfoo). The cost to build these properly is high; the cost to buy is low.
  • Build the orchestration and business logic. The bit that knows your data, your workflows, your refusal patterns, your audit requirements. This is where your competitive advantage lives, and where SaaS will always fit poorly.

This pattern means "custom build" rarely means "build everything." It means writing perhaps 5,000-20,000 lines of well-tested orchestration code that sits between bought components. A team of two engineers can deliver this in 8-16 weeks for most mid-market use cases.

The cost reality of custom AI

Buyers consistently underestimate the operational cost of custom AI. The headline build figure is the easy part. The real numbers, based on what mid-market builds actually cost in 2026:

  • Initial build (12-week first version): £60k-£180k depending on integration complexity, evaluation rigour, and security requirements.
  • Annual operating cost: typically 30-50% of the build cost, covering inference, observability, evaluation runs, model upgrades, and on-call support. So a £120k build runs £40-60k per year to operate properly.
  • Model inference: highly variable, but for a customer-facing assistant handling 50,000 interactions per month, expect £2-8k per month in API costs depending on model choice and average context length.
  • Re-evaluation when models change: every time you upgrade a foundation model, you need to re-run your evaluation suite. Budget two engineering days per quarter, minimum.

SaaS comparison is rarely apples-to-apples. The SaaS price includes operational coverage you will otherwise pay for yourself. When comparing, add the operating cost to the build cost, then compare five-year TCO.

Regulatory and compliance considerations

For UK businesses, two regimes matter most. UK GDPR (overseen by the ICO's guidance on AI and data protection) requires you to demonstrate lawful basis, transparency, and appropriate safeguards for any AI system processing personal data. The ICO has published specific guidance on AI auditing, automated decision-making, and explainability. Custom builds give you full control of these controls; SaaS requires you to trust and audit the vendor's implementation.

The EU AI Act applies to UK businesses selling into the EU and brings its own classification of high-risk AI systems with concrete obligations. For high-risk classifications (recruitment, credit decisioning, critical infrastructure, education), the auditability and documentation burden is significant, and custom builds often handle this more cleanly than SaaS where the vendor's documentation may not match your risk assessment requirements.

For financial services, the FCA's expectations on model risk management (informed by the PRA's SS1/23 supervisory statement on model risk management for banks) push towards traceability and challenge that is often easier to evidence in a custom system you control end-to-end.

None of this automatically pushes you to build. Well-chosen SaaS vendors handle compliance carefully and provide the documentation you need. The point is that compliance posture should be a deliberate input to the decision, not an afterthought.

A decision checklist you can actually use

Before committing to build, work through this in writing with your team. The act of writing the answers exposes weak cases.

  1. Have we evaluated at least three SaaS alternatives with a paid pilot of the leading one?
  2. What specifically can the leading SaaS not do, and have we asked the vendor's product team if it is on their roadmap?
  3. What is our five-year TCO for SaaS at projected volume? What is it for custom build plus operation?
  4. Who on our team will own this system in year two? Not who will build it, who will run it.
  5. What is our evaluation strategy? How will we know if the system is getting better or worse over time?
  6. What is the fallback when the underlying model changes behaviour? Do we have a rollback path?
  7. If we build, can we credibly hire or contract the skills to maintain it for three years?
  8. What does our compliance team say about the audit trail and data flows in each option?

If you cannot answer six of these eight clearly, you are not ready to make the decision yet. That is a useful finding, not a failure.

FAQs

How long does a custom AI build typically take?

For a well-scoped first version, expect 8-16 weeks from kickoff to production for most mid-market projects. The first two weeks are discovery, data assessment, and specification. Weeks three to ten are build and iteration, with working software demonstrable from week four. Weeks ten to sixteen cover evaluation hardening, security review, user acceptance testing, and a controlled rollout. Projects that take longer usually have unresolved scope, missing data infrastructure, or organisational change problems rather than technical complexity. If you are quoted 9-12 months for a first version with no interim delivery, push back hard - that is a risk pattern, not a quality signal.

What is the minimum viable budget for a custom AI build?

Realistically, £40k-£60k is the floor for anything you should put into production with proper evaluation, monitoring, and refusal patterns. Below that, you are buying a prototype that will need to be rebuilt to handle real traffic safely. Most mid-market first builds land between £80k and £180k for the initial version, with annual operating costs of 30-50% of the build figure. If your budget is below £40k, the right answer is almost always a no-code or low-code automation using n8n, Make, or Zapier with an LLM API call, which can solve a surprising amount of business logic for £8k-£20k.

Should we fine-tune a model or use RAG?

Start with retrieval-augmented generation. Fine-tuning is the right answer in narrow circumstances: when you need a specific output format the base model resists, when you have a large volume of high-quality labelled examples, or when latency and cost requirements demand a smaller model. For knowledge grounding - making the model answer correctly from your documents - RAG is almost always the better starting point. It is cheaper, easier to debug, easier to update (just re-index your documents), and easier to audit. Fine-tuning a model on your knowledge base often performs worse than RAG and costs more to maintain.

What is the biggest hidden cost in custom AI builds?

Evaluation. Teams budget for the build and the inference, but forget that an AI system needs continuous evaluation to catch regressions when models update, prompts change, or retrieval drifts. A proper evaluation harness requires a labelled test set (50-500 examples depending on use case), regression tests run on every change, periodic re-runs when foundation models update, and human review of edge cases. Building this properly is 15-25% of initial engineering effort and 10-20% of ongoing engineering time. Skipping it is the single most common reason production AI systems silently degrade.

Can we build this in-house instead of using an agency?

If you have at least one senior engineer with production LLM experience, an ML engineer or strong data engineer, and a product owner who can give 30%+ of their time to it, in-house is viable. If you do not have those three roles already in the building, hiring them takes 4-6 months in the current UK market and costs £350k+ in total compensation. Agencies are usually a sensible bridge - they deliver the first version while transferring knowledge to your team, who then take over operation. The mistake is using an agency to build something with no internal owner identified for year two.

How do we handle vendor lock-in with custom AI?

Design the system so the foundation model is swappable. This means abstracting model calls behind a thin internal interface, keeping prompts in version control rather than vendor-specific tooling, storing your evaluation set in your own systems, and avoiding vendor-specific features (function calling syntax, response formats) where standard alternatives exist. With this discipline, switching from Anthropic to OpenAI to an open-weights model running on Bedrock should take days, not months. The same logic applies to vector databases: pgvector, Pinecone, and Weaviate all support broadly similar operations, so keep your embedding logic portable.

What happens to our custom AI when foundation models change?

Foundation model updates can change behaviour in ways that affect your application. Sometimes for the better (GPT-4 to GPT-4o reduced cost and latency), sometimes worse (subtle changes to refusal behaviour or formatting). The mitigation is your evaluation suite. Run it before adopting any new model version. Keep the previous model version pinned in production until the new one passes evaluation. Most providers offer pinned model versions for 6-12 months, giving you a controlled migration window. Budget two engineering days per quarter for model upgrade evaluation as a baseline operating cost.

When is the build-versus-buy decision genuinely reversible?

If you buy first and later decide to build, you have lost time but learned what you actually need - usually a positive outcome. If you build first and later decide to buy, you have spent more money but you can usually migrate. The truly hard-to-reverse decisions are: training a custom model on proprietary data you cannot easily extract, building deep into a vendor-specific platform like Salesforce Einstein or Microsoft Copilot Studio, and any architecture that bakes one foundation model's quirks into the data layer. Avoid those, and the build-versus-buy decision becomes much more forgiving.

Where to go from here

The build-versus-buy decision should be a structured exercise, not a hunch. Write the answers to the eight-question checklist above with the people who will own the system in year two. Get real pricing from at least three SaaS vendors and a real build estimate from someone who has shipped production AI before. Then make the call with the maths in front of you. AI Advisory runs this exact assessment as part of our two-week AI Strategy & Readiness engagement, and we are happy to talk through your specific situation.

Ready to put this into production? book a discovery call.

Get started

Ready to automate your operations?

Walk away with a prioritised list of automation and AI wins, costed, sequenced, and yours. The call is 30 minutes, free, and binds you to nothing. The shortest path to knowing whether AI Workflow Agency is the right fit.