A Comprehensive Guide to RAG Implementations

chandrasekhar.kallipalli
Oct 28, 2024
8 min read

Updated: Oct 29, 2024

Introduction:

RAG is a framework for improving model performance by augmenting prompts with relevant data outside the foundational model, grounding LLM responses on real, trustworthy information. Users can easily “drag and drop” their company documents into a vector database, enabling a LLM to answer questions about these documents efficiently.

Challenge:

Clients often have vast proprietary documents.
Extracting specific information is like finding a needle in a haystack

RAG brings the power of LLMs to structured and unstructured data, making enterprise information retrieval more effective and efficient than ever.

Why is RAG Needed?

If you have tried using some LLMs, you might have encountered situations where, when asking LLMs about the latest updates on a topic, you receive responses like ‘Sorry, I cannot provide real-time data up to 2024.’ In essence, this is a fundamental limitation of LLMs; their knowledge is essentially frozen at the last training point, and they cannot learn or remember new information unless retrained.

At this juncture, RAG technology is needed to overcome this limitation. So, what exactly is RAG? Why is it so important? And how does it work..!

To understand by analogy, imagine you are a journalist tasked with reporting the latest developments of an event. How would you proceed? First, you would research the event, collect related articles or reports, and then use this information to craft your news story. For LLMs, RAG employs a similar method. “Retrieval” is the gathering of relevant information, and “Generation” is using this information to compose the news article.

Composition of the RAG System:

RAG is not just a single component or program, but a system. The RAG system is a complex assembly composed of multiple components, with the LLM being just one of them.

How Does RAG Work?

Now that we understand what RAG is and the composition of the RAG system, let’s take a look at the specific workflow of the RAG system.

Data Collection

First, all data required for a specific application must be collected. For example, for an online store, this would include product information, inventory details, discount information, etc.

Data Chunking

chunking refers to the process of breaking down the data source into smaller, more manageable chunks of data. Each chunk focuses on a specific topic. When the RAG system retrieves information from the data source, it is more likely to directly find data relevant to the user’s query, thus avoiding some irrelevant information from the entire data source. This method also improves the efficiency of the system, allowing for quick access to the most relevant information instead of processing the entire data set.

Embedding

Now that the data source has been broken down into smaller parts, it needs to be converted into vector representations, which involves transforming text data into embeddings.

Processing User Queries and Generating Answers

When a user query enters the system, it must also be converted into an embedding or vector representation. To ensure consistency between document and query embeddings, the same model must be used for processing.

Once the query is converted into an embedding, the system compares the query embedding with the document embeddings. It uses algorithms like cosine similarity and Euclidean distance to identify and retrieve the data chunks, most like the query embedding.

Generating Answers Through LLM

The retrieved text chunks, along with the user’s initial query, are input into the LLM, and the algorithm uses this information to generate a consistent reply to the user’s question through the chat interface.

However, not all RAG models are created equal. Let's delve into the 10 primary types.

Types of RAG:

1.Simple RAG:

Chunking: RAG begins with turning your structured or unstructured dataset into text documents and breaking down text into small pieces (chunks).
Embed documents: A text embedding model steps in, turning each chunk into vectors representing their semantic meaning.
VectorDB: These embeddings are then stored in a vector database, serving as the foundation for data retrieval.
Retrieval: Upon receiving a user query, the vector database helps retrieve chunks relevant to the user's request.
Response Generation: With context, an LLM synthesises these pieces to generate a coherent and informative response.

2.Self-RAG:

Source: Self-=RAG

SELF-RAG stands for Self-Reflective Retrieval-Augmented Generation, and it offers a fresh approach to how AI retrieves, generates, and critiques information. By incorporating self-reflection, this framework empowers AI to adaptively pull in relevant data, scrutinize its own responses, and ensure each output is backed by solid evidence.

Adaptive Retrieval: The model decides when retrieval is necessary and retrieves relevant passages on-demand.

Reflection Tokens: Special tokens are used to control the generation process and assess the quality and relevance of the generated content.

Critique and Generate: The model generates content and critiques its output using reflection tokens to ensure high-quality and factually accurate results.

3.Adaptive RAG:

Image source: Adaptive RAG

Adaptive-RAG is an adaptive question answering framework that dynamically adapts its strategy based on query complexity:

No retrieval for the simplest queries
Single-step retrieval for moderate complexity
Multi-step retrieval for the most complex queries

Below is a flowchart demonstrating how we could implement Adaptive RAG:

Source: Langchain Adaptive RAG

As you can see, there are multiple components, with the most important one being Query Analysis (our classifier). This means that based on a classifier, our model can route the prompt to the appropriate process. For example, if the user prompt is related to our stored data, we can proceed with RAG. Otherwise, we would use another process, such as searching Web search.

4.GraphRAG:

Source: Graph RAG

Graph RAG, short for Retrieval-Augmented Generation with Graphs, is a powerful combination of natural language processing (NLP) and knowledge graph technology. It enables you to construct a knowledge graph from your data, allowing your applications to efficiently retrieve and understand complex information, much like a human expert would.

Graph Construction:

How data is represented as a graph:

Nodes and Edges: In a graph, data points are represented as nodes, and the relationships between them are represented as edges.
Example: Imagine you have a collection of text data about various topics.
To create a knowledge graph:
Nodes: Each node could represent a key concept or entity mentioned in the text, such as “Artificial Intelligence,” “Machine Learning,” or “Data Science.”
Edges: Edges would represent the relationships between these concepts, like “Machine Learning is a subset of Artificial Intelligence” or “Data Science utilises Machine Learning.”

Retrieval Process:

How the model navigates the graph to retrieve relevant information:

Graph Traversal: The AI model traverses the graph using algorithms like breadth-first search (BFS) or depth-first search (DFS) to find relevant nodes and their connected edges.
Relevance Ranking: It evaluates the relevance of nodes and edges based on criteria such as proximity, connection strength, and contextual importance.

Augmented Generation:

How retrieved information is used to generate responses:

Integration of Retrieved Data: The AI integrates the retrieved information from the graph to form a comprehensive understanding of the query.
Contextual Relevance: It ensures that the generated response is coherent and contextually relevant, using the structured knowledge from the graph.

5.Hybrid RAG:

HybridRAG is an advanced framework that merges both RAG and GraphRAG. This integration aims to enhance the accuracy and contextual relevance of information retrieval. In simple terms, HybridRAG uses context from both retrieval systems (RAG & GraphRAG) and the final output is a mix of both the systems.

HybridRAG operates through a sophisticated two-tiered approach. Initially, VectorRAG retrieves context based on textual similarity, which involves dividing documents into smaller chunks and converting them into vector embeddings stored in a vector database. The system then performs a similarity search within this database to identify and rank the most relevant chunks.

Simultaneously, GraphRAG uses Knowledge Graphs to extract structured information, representing entities and their relationships within the documents. By merging these two contexts, HybridRAG ensures that the language model generates contextually accurate responses and rich in detail.

6.Corrective RAG:

Corrective Retrieval-Augmented Generation (CRAG) is a recent technique in natural language processing that aims to correct factual inconsistencies and errors in generated text. CRAG leverages both generative and retrieval-based capabilities to produce more factually aligned outputs.

Corrective RAG is the method used to grade documents based on their relevance to the data source. If the data source is related to the question, the process proceeds to generation. Otherwise, the framework seeks additional data sources and utilizes web search to supplement retrieval.

Image source: Corrective RAG

Retrieving Relevant Information:

The system starts by retrieving documents that are relevant to the user’s question. This initial step ensures that the AI has access to the most pertinent data available.

Grading Document Relevance:

To ensure the quality of the information, the system grades the relevance of each retrieved document. This step filters out less relevant data, focusing on the most useful information.

Transforming Queries:

If the initial retrieval doesn’t provide sufficient information, the system can transform the user’s query to improve the search process. This helps in refining the question to get better results.

Performing Web Searches:

If necessary, the system performs web searches to supplement the retrieved documents with the latest and most relevant information available online.

Generating Answers:

Using the retrieved documents, the system generates a concise and coherent response. This process leverages powerful language models to produce human-like answers.

7.Agentic RAG:

Image source: Agentic RAG

Agentic RAG = Agents + RAG implementation

Agentic RAG is all about injecting intelligence and autonomy into the RAG framework. It’s like giving a regular RAG system a major upgrade, transforming it into an autonomous agent capable of making its own decisions and taking actions to achieve specific goals.

Agentic RAG transforms how we approach question answering by introducing an innovative agent-based framework. Unlike traditional methods that rely solely on large language models (LLMs), agentic RAG employs intelligent agents to tackle complex questions requiring intricate planning, multi-step reasoning, and utilization of external tools.

These agents act as skilled researchers, adeptly navigating multiple documents, comparing information, generating summaries, and delivering comprehensive and accurate answers. Agentic RAG creates an implementation that easily scales. New documents can be added, and each new set is managed by a sub-agent.

Features of Agentic RAG:

Orchestrated question answering: Agentic RAG orchestrates the question-answering process by breaking it down into manageable steps, assigning appropriate agents to each task, and ensuring seamless coordination for optimal results.
Goal-driven: These agents can understand and pursue specific goals, allowing for more complex and meaningful interactions.
Planning and reasoning: The agents within the framework are capable of sophisticated planning and multi-step reasoning. They can determine the best strategies for information retrieval, analysis, and synthesis to answer complex questions effectively.
Tool use and adaptability: Agentic RAG agents can leverage external tools and resources, such as search engines, databases, and specialized APIs, to enhance their information-gathering and processing capabilities.
Context-aware: Agentic RAG systems consider the current situation, past interactions, and user preferences to make informed decisions and take appropriate actions.
Learning over time: These intelligent agents are designed to learn and improve over time. As they encounter new challenges and information, their knowledge base expands, and their ability to tackle complex questions grows.
Flexibility and customisation: The Agentic RAG framework provides exceptional flexibility, allowing customisation to suit particular requirements and domains. The agents and their functionalities can be tailored to suit tasks and information environments.
Improved accuracy and efficiency: By leveraging the strengths of LLMs and agent-based systems, Agentic RAG achieves superior accuracy and efficiency in question answering compared to traditional approaches.

8.Speculative RAG:

Image source: Speculative RAG

Speculative RAG means instead of having one AI model trying to do everything like both finding and understanding the documents we split the job between two models: a Specialist and a Generalist.

The Specialist is like a super-focused researcher who drafts possible answers from different angles. Then, the Generalist (kind of like your friendly neighbourhood editor) picks the best draft and polishes it up. This tag-team approach not only speeds things up but also improves the accuracy of the AI’s answers.

Speculative RAG Workflow:

Start with a Question: The AI gets a question that needs answering — like, “What’s the meaning of life?” (Okay, maybe something a bit more specific!)
Retrieve Documents: The AI pulls together a bunch of relevant documents. So far, it’s just like regular RAG.
Drafting Multiple Answers: Here’s where the magic happens! The Specialist AI generates several possible answers by looking at different parts of the documents. Each draft might focus on a different perspective.
Picking the Best Draft: The Generalist AI then steps in, checks each draft, and decides which one makes the most sense based on all the info. It’s like an AI version of “The Voice,” but for answers instead of singers!
Final Answer: Voilà! The AI gives you the best possible answer, faster and more accurately than before.

Conclusion:

RAG implementations offer a versatile and robust framework for building AI-driven applications. Each pattern serves unique needs and use cases, from simple retrieval and generation to advanced self-corrective strategies. Developers can create more effective, accurate, and reliable generative AI systems by understanding these patterns and their applications.

References:

Self-RAG: Research Paper
Adaptive-RAG: Research Paper
Graph-RAG: Research Paper
Hybrid-RAG: Research paper
Corrective-RAG: Research Paper
Agentic-RAG: Research Paper
Speculative-RAG: Research Paper

Let's Talk