Why we didn't build a RAG pipeline

By Barry Thomas • 14 April 2026 • 8 min read

If you run a mid-tier organisation in Australia, and you’re looking at an internal AI assistant, chances are the proposal came from a vendor or a consultant. In our work advising firms in this position, it almost invariably involves a retrieval pipeline: embeddings, a vector store, a reranker, an evaluation harness. The detail won’t be easy to follow, but the pitch will combine confidence and plausibility to suggest this is what serious implementation looks like. The quote will reflect the engineering.

You are right to be suspicious. This piece is about why.

The decision

One of us, in a separate CIO role at a mid-tier Australian organisation, recently faced exactly that question. The organisation had a few hundred pages of Confluence content covering operations, HR, IT governance, fundraising procedures, AI policy — roughly what you would expect for a small-to-mid-sized service organisation. A Claude-based internal assistant was being built on top. The conventional next move was to commission the pipeline described above. We didn’t.

The architecture is a cascade over platform-native search — primary knowledge base first, then wider Confluence, then SharePoint — loading the selected pages into Claude’s context window and letting the model read them. There is no custom retrieval layer. No embeddings store, no vector database, no reranker.

Three reasons for the decision

Scale didn’t justify it. The knowledge base is a few hundred pages. Even at the top end of plausible organic growth for an organisation of its size, the ceiling is somewhere between a thousand and fifteen hundred pages. The complexity cliff that would justify a custom retrieval architecture doesn’t arrive until corpora reach millions of tokens and thousands of documents. That is a different problem at a different scale. Pretending it is the problem in front of you produces the wrong architecture — and, in a vendor context, the wrong invoice.

The platform is improving in our favour. The platforms we depend on — Atlassian’s Confluence, connected to a frontier-model provider — keep maturing. Context windows keep getting larger. Native search, memory, and retrieval features keep landing, and the same is broadly true of comparable platforms. Building a sophisticated architecture today — chunking strategy, embeddings pipeline, vector store, reranking, evaluation harness — means committing significant work that stands a real chance of being stranded as the platform catches up within an acceptable timeframe. Paying a vendor to solve a near-term problem that the platform is on track to solve for you, on its own roadmap, is the archetypal wrapper-obsolescence trap.

Content quality matters more than architecture. This is the rationale that does the most work, and it is the one most easily overlooked. Retrieval accuracy is primarily a function of document quality, not retrieval mechanism. A sophisticated pipeline over badly-structured, out-of-date, or contradictory content rarely outperforms simple search over well-structured, current, well-named content by enough to justify the investment. If your knowledge base contains significant unreviewed historical material — and most do — investment in content quality has higher leverage on assistant performance than architectural investment. Cleaning up content is something your own people can do, without specialist tooling. It is also the investment that compounds regardless of what happens to the architecture around it.

Reframing the scaling question

The intuitive question about scale is: when does the knowledge base exceed the model’s context window? That is the wrong question. The architecture doesn’t load the whole knowledge base into every query; the platform’s search selects the relevant pages and those are what the model reads.

The real question is: when does the platform’s search start returning the wrong pages often enough to matter?

That is a function of corpus size and topic density. Full-text search discriminates well when knowledge domains use distinct vocabularies — different topics use different words, and search can tell them apart. It degrades within dense topic clusters, where many pages cover closely related things and a query matches a large number of them. In the case we know, domains were distinct enough that this didn’t bite until the corpus would need to grow several times larger than it was. Whether the same holds for your domains is worth checking before assuming so — and it is worth identifying which of your domains is the densest cluster, because that is where retrieval precision will soften first.

The binding constraint isn’t context size. It is retrieval precision. That framing is more useful both for deciding whether to build, and for deciding when to reconsider.

The load-bearing caveat

The content-quality argument is doing a lot of work above, and it is worth naming plainly: if content-quality work doesn’t actually happen, the reason not to build the pipeline evaporates. “We don’t need a pipeline because content quality is the leverage point” only holds while someone is genuinely working the leverage point.

This is not a flaw in the argument. It is the feature that makes it honest. Declining a pipeline puts the unglamorous work — maintaining, rewriting, and retiring knowledge-base content — onto the roadmap. If that work doesn’t end up on the roadmap, the decision to defer the pipeline is not a considered technical stance; it is procrastination with a better wrapper.

The discipline is boring. The discipline is also yours to keep, and it isn’t something a vendor will do for you.

The bigger thing

Until recently, a conversation like the one above would assume there was an engineer in the room — someone who understood retrieval pipelines, the trade-offs involved, and could be reasoned with technically. Mid-tier organisations rarely have that engineer in-house. They buy her, through a vendor or a consulting engagement. And the engineer, reasonably, proposes work that an engineer is paid to do.

The point worth sitting with is that at the scale most of our clients operate at, the engineer doesn’t need to be in the room any more.

Andrej Karpathy, who has done more than most to explain AI architecture to working engineers, recently described building a personal knowledge base of roughly a hundred articles and four hundred thousand words, with the model itself compiling and maintaining the wiki, and querying it directly. His own framing: he “thought I had to reach for fancy RAG” before finding that at the scale he’s working at, the model’s own context-handling was enough — the only retrieval layer he needs is the one the model maintains for itself.¹

That scale is comparable in size to the knowledge base of a typical mid-tier Australian organisation.

Karpathy’s case is personal use; a mid-tier organisation’s assistant serves many users, with varied queries and a lower tolerance for imperfect answers. The case we’ve described — a shared knowledge base of a few hundred pages, queried by staff — is the organisational analogue, and the same architectural reasoning holds there. The commonality across both is that corpus size and the model’s current capabilities make a custom retrieval pipeline unnecessary.

If a researcher of Karpathy’s sophistication concludes that a pipeline isn’t necessary at that scale, the burden of proof for building one on your corpus sits firmly with the person proposing it — whether that person is an in-house engineer or a vendor writing the quote.

The current generation of AI is genuinely good at what previous generations of software struggled to do well without substantial information-retrieval engineering: reading, summarising, synthesising, connecting, indexing. Our view is that the engineering that used to be necessary to bridge your knowledge and your user is, at mid-tier scale, being steadily absorbed into the models. At the scales our clients operate at, the bridge is now — or is fast becoming — the model.

That does not mean every AI project can dispense with engineering. Projects with strict precision or latency demands, regulatory constraints, or adversarial-data conditions still need real architecture. But for the typical internal AI assistant at the typical mid-tier Australian organisation, the engineer you would have had to buy in from a vendor is increasingly not what you need. What you need is good content, a sensible cascade into the platform’s own search, and a lightweight way to check that the answers are still landing.

When to reconsider

The argument above has conditions. The decision should be revisited if any of the following occur.

An evaluation set — a lightweight collection of canonical question-and-answer pairs — starts failing consistently. This is the earliest and cheapest signal, and the one mid-tier organisations are most likely to overlook. Building and maintaining one is modest work and doesn’t require a specialist.

The corpus approaches the size at which retrieval precision is expected to soften, and platform improvements haven’t materially improved the platform’s search.

A scope expansion is proposed — new domains, new organisational units, new types of content. The maths of retrieval precision shifts with scope. Reassess before migration begins, not after.

A specific use case emerges that requires retrieval precision or speed the current approach cannot deliver.

Staff trust in the assistant starts to decline, and the declines are traceable to retrieval rather than to content quality.

In any of these cases, the first task is to work out whether the problem is architecture or content. They are different problems and they need different fixes. Conflating them — and it is easy to conflate them, because content-quality problems show up to the user as “bad answers” — is the fastest route to an infrastructure bill that solves the wrong thing.

The broader lesson

The simpler architecture isn’t just a good decision for the case described. It is evidence of a shift that matters for mid-tier organisations being pitched on AI builds — certainly for the ones we advise. The middleware that consultants and vendors sell — the retrieval pipeline, the evaluation harness, the observability layer — is increasingly being absorbed into the platforms themselves, at the scales you actually operate at. Selling it anyway is a reasonable commercial strategy. Buying it is a choice that deserves a second look.

Ask the question plainly: “Could we reach the same outcome with platform-native search, a well-structured knowledge base, and the model reading the selected pages?” If the answer is yes — and more often than you would expect, it is — you probably don’t need the pipeline. You certainly don’t need to be sold one.

Architecture you didn’t build can’t become obsolete. Content you clean up compounds. The platform will keep moving. These three facts, taken together, explain most of what is worth saying about why we didn’t build a RAG pipeline — and most of what you need to know before someone tries to sell you one.

Andrej Karpathy, post on X, 3 April 2026, on using LLMs to build personal knowledge bases from markdown sources. ↩