Cloud Combinator | Case Studies

Blog: How RAG Is Reshaping AI - The Real Pros and Cons

Written by Daniel Vallejo | Dec 15, 2025 12:01:02 PM

How RAG Is Reshaping AI: The Real Pros and Cons

Exec TL;DR: Retrieval Augmented Generation (RAG) remains a core approach for grounding LLMs in data, but recent research highlights practical limitations around data quality, retrieval relevance, and reasoning depth. Businesses are now adopting RAG not as a standalone solution but as part of broader, hybrid architectures involving agents, structured retrieval, provenance systems, and workflow orchestration. RAG is still useful — but it is increasingly just one component within a larger AI stack.

The big picture

RAG is moving from “solution” to “infrastructure primitive”. Instead of building a RAG system, businesses are building AI systems that use RAG alongside other tools: agents, APIs, search, knowledge graphs, and workflow engines.

Recent benchmarks and surveys on retrieval-augmented systems — CRAG (Yang et al., 2024), RAGBench (Friel et al., 2024), BERGEN (Rau et al., 2024), and the review by Zhang & Zhang (2025) on hallucination mitigation in retrieval-augmented LLMs — converge on a consistent conclusion: RAG improves grounding and traceability, but is not sufficient on its own for harder reasoning and governance demands.

 

Typical enterprise RAG architecture used to ground model outputs in internal content.

 

1) Why RAG became standard in AI solutions

1.1 Grounding model responses in real documents

RAG gives LLMs access to current, internal information. By retrieving relevant snippets from a vector database for RAG at query time, the model works with fresher and more specific knowledge than what it saw during pre-training.

Enterprise and academic case studies — for example Ayala & Béchard (2024) and Song et al. (2024) — together with hallucination corpora such as RAGTruth (Niu et al., 2024), show that retrieval-augmented setups reduce unsupported claims relative to “model-only” prompting, provided retrieval quality is high. That directly supports AI hallucination reduction in settings where incorrect answers carry operational or regulatory risk.

Key points for your business

  • RAG is most effective when documents are reasonably clean and well organised.
  • Grounded citations help reviewers quickly check whether an answer is defensible.
  • Accuracy gains depend heavily on retrieval quality, not only on the underlying model.

1.2 Lower cost than fine-tuning for knowledge updates

RAG updates knowledge via data, not model weights. Findings from both vendors and academic groups increasingly show that, for many knowledge-heavy tasks, keeping embeddings and indexes fresh is cheaper and faster than repeatedly fine-tuning models, especially when the underlying facts change frequently.

Key points for your business

  • Use fine-tuning mainly for behaviour, style, or task specialisation.
  • Use RAG for domain knowledge that changes regularly (policies, product docs, FAQs).
  • Track cost per query, including embedding refresh and retrieval overhead.

1.3 Traceability and auditability

RAG makes it easier to show where an answer came from. As responses are supported by retrieved context, teams can tie outputs back to specific documents. This aligns with the expectations of an AI governance framework and guidance such as the NIST AI Risk Management Framework.

Key points for your business

  • Citations support internal sign-off and external audit.
  • Source-aware logs help with incident review and model improvement.
  • Governance becomes a data problem as much as a model problem.

1.4 Fast path to initial production systems

RAG was attractive because it was implementable with existing assets. Most organisations already have document stores, wikis, knowledge bases, and shared drives. RAG allowed teams to connect those assets to LLMs and deliver early RAG production systems without committing to custom model development.

Key points for your business

  • RAG remains a pragmatic option for early AI adoption.
  • Initial value often comes from relatively narrow, well-curated domains.
  • The same pipeline can later feed agents and more advanced workflows.

2) The limitations: what recent research shows

RAG works within boundaries that are now well documented. Current research focuses less on “does RAG work?” and more on “under which conditions does it work well, and where does it struggle?”. Across CRAG, RAGBench and BERGEN, naïve “retrieve-and-concatenate” architectures consistently fall short on harder questions, long-tail facts, and time-sensitive knowledge, even when strong base models are used.

2.1 RAG inherits all your data-quality issues

Analyses of the RAGTruth corpus and the survey by Zhang & Zhang (2025) demonstrate that retrieval quality is tightly correlated with the structure and freshness of the underlying corpus. If the knowledge base is noisy, out of date, or poorly segmented, RAG outputs degrade accordingly.

  • Outdated or duplicated content creates inconsistent answers.
  • Naive chunking reduces retrieval relevance.
  • Missing or weak metadata breaks permission-aware filtering.

In practice, RAG simply reflects the state of your AI knowledge management. It does not fix underlying content problems; it exposes them.

2.2 Limited multi-document reasoning

Long-context studies — most prominently “Lost in the Middle” (Liu et al., 2023) and “Long Context RAG Performance of Large Language Models” (Leng et al., 2024) — together with evaluation schemes including the RAG Triad highlight that, even with good retrieval, many models struggle with:

  • Reasoning across multiple documents over long contexts.
  • Reconciling conflicting evidence.
  • Producing step-by-step arguments grounded in the retrieved text.

RAG improves access to information; it does not automatically provide deeper reasoning. For complex decision-support, additional mechanisms are needed (e.g. reasoning chains, verification steps, or agents).

2.3 Compliance, governance, and data residency

Centralising data for RAG can conflict with existing controls. Assessments from industry commentators and vendor security teams in 2024 note that many organisations have had to revisit their initial designs for so-called secure RAG pipelines.

  • Embedding stores require clear data-lineage and deletion policies.
  • Regional residency and sectoral rules (finance, health) can limit centralisation.
  • Access control needs to operate at retrieval time, not only at answer time.

2.4 Retrieval performance and operational load

At scale, retrieval is a performance engineering problem. Large indexes require maintenance, hybrid search (BM25 + dense vectors), re-ranking models, and caching strategies. Realistic RAG performance optimisation involves tuning both the retriever and the surrounding infrastructure.

  • Latency targets need to account for retrieval and ranking, not just LLM calls.
  • Index refresh schedules can become a non-trivial operational cost.
  • Monitoring must track retrieval quality, not only response time.


Diagram of a retrieval-augmented generation pipeline from user query through retrieval to LLM response.

 

3) RAG as a primitive, not the whole architecture

The industry trend is to treat RAG as one tool inside a larger system. Recent frameworks and platform features increasingly assume that AI systems will orchestrate multiple tools, not just a retriever and an LLM.

3.1 Agentic and workflow-aware AI

Agent frameworks model a loop that looks closer to how humans work:

  • Retrieve information when necessary.
  • Reason over it and plan actions.
  • Call tools or APIs.
  • Verify results and iterate.

In these systems, RAG is a capability the agent can call, not the primary architecture. It sits alongside database queries, API integrations, and business-specific tools.

3.2 From static corpora to dynamic retrieval fabrics

Retrieval is becoming more “native” to AI systems. Instead of “upload documents → embed → hope retrieval works”, teams are moving towards:

  • Live querying of internal systems (SQL, analytics, line-of-business apps).
  • Fusion of structured and unstructured retrieval.
  • Semantic indexing across documents, tables, dashboards, logs, and events.

The result is less like “search + LLM” and more like a semantic access layer over the organisation’s entire data graph. Retrieval augmented generation is still present, but embedded within this wider fabric.

3.3 Longer context windows and improved models

Modern models with very long context windows reduce the need for aggressive chunking, and multi-step prompting techniques improve reasoning. However, this does not make RAG obsolete. Instead, retrieval shifts from “blindly stuff the prompt with N chunks” to:

  • Planner–executor patterns where the model decides what to retrieve and when.
  • Hierarchical retrieval that narrows down large corpora in stages.
  • Model-controlled retrieval strategies that adapt to the task.

3.4 Provenance, verification, and trust

Governance requirements are pushing RAG to evolve. Many businesses now ask for:

  • Source-linked answers with confidence indicators.
  • Explainable reasoning steps for critical decisions.
  • Audit trails across data access, retrieval, and generation.

This is leading to combined patterns such as retrieval + provenance + verification LLMs, rather than pure RAG alone. The retrieval layer remains important, but as part of a broader assurance pipeline.

 

 

Diagram of an agentic AI system with separate layers for tools and retrieval, orchestration, and reasoning.

4) When RAG fits — and when it doesn’t

4.1 Where RAG is a strong fit

RAG remains a natural option for:

  • Internal knowledge bases and policy libraries.
  • Customer support and employee help desks.
  • Document and contract Q&A.
  • Summaries grounded in verifiable documents.

These are the cases where a well-designed RAG architecture and good corpus hygiene can deliver measurable improvements with relatively modest risk.

4.2 Where RAG alone is not enough

RAG, by itself, is usually insufficient for:

  • Complex decision support with strong reasoning requirements.
  • Real-time operational decisions based on streaming data.
  • Workflows that require detailed, step-by-step justifications.
  • Scenarios where centralising data conflicts with regulatory constraints.

In these areas, RAG typically becomes one component within a hybrid system that also uses agents, rules engines, structured queries, or knowledge graphs.

5) What this means for your AI roadmap

RAG is still worth investing in, but as part of a broader design. The direction of travel in both research and industry points to RAG as an infrastructure layer rather than a standalone product.

Practical steps for the next 6–12 months

  • Invest in content hygiene and AI knowledge management: deduplicate, curate, and label the key corpora you plan to expose via RAG.
  • Design RAG pipelines with governance from day one: permission-aware retrieval, data lineage, and integration with your AI governance framework.
  • Plan for hybrid architectures: assume that agents, tools, and structured data access will sit alongside RAG, not behind it.
  • Instrument retrieval: measure retrieval quality, latency, and answer usefulness, not just model metrics.

Cloud Combinator works with teams on exactly these transitions — from focused generative AI pilots built on RAG to agentic, workflow-native systems that combine retrieval, reasoning, and governance. Through offerings such as Agentic AI Systems, GenAI Production System, Data Foundation, and our Secure & Compliant Cloud Platform, we focus on making RAG a reliable part of a wider, production-grade stack rather than a fragile one-off solution.

The core question is no longer “Should we use RAG?” but “Where does RAG sit in our architecture, and what needs to surround it so that it delivers reliable, governed value over time?”.

References