Incorporating RAG in LLM-based Applications

Aug 2, 2024   ·  14 min read

Have you ever wished ChatGPT had your personal knowledge and context on-hand so that it could help you even more? Kind of like an old friend that knows you from years of experience? Well, this can be made possible through Retrieval-Augmentation Generation!

Are you still new to the world of Large Language Models (LLMs)? Despite hearing about it for two years now? Well, you're not the only one (*sigh of relief*). LLMs are still new to many people and even fewer have hands-on experience developing LLM-based applications.

What will you get from this post?

This post will walk you through the setup of your own LLM-based application. Not only that, but one that has access to a "knowledge base" of your choice using Retrieval-Augmentation Generation (RAG) to give you more accurate responses to your LLM queries for custom use-cases. 

By the end of this exercise, you'll have the following LLM-based application (or a custom version of your own) running on your computer for you to play with, learn from, and customize.

Let's go! 🚀🚀🚀

First off, what is RAG?

New practices and tools are emerging in the LLM space every day to help aid in the creation and maintenance of LLM-based applications. One such innovation is the concept of RAG.

The above diagram comes from the research paper Retrieval-Augmented Generation for Large Language Models: A Survey, which was published on March 7, 2024. The diagram breaks down the individual steps involved behind the scenes when a user queries an LLM and receives a response – One from using RAG and another from not using RAG. In this example, you can see that incorporating RAG greatly improves the LLM's response:


"Retrieval-Augmented Generation (RAG) enhances LLMs by retrieving relevant document chunks from external knowledge base through semantic similarity calculation. By referencing external knowledge, RAG effectively reduces the problem of generating factually incorrect content. Its integration into LLMs has resulted in widespread adoption, establishing RAG as a key technology in advancing chatbots and enhancing the suitability of LLMs for real-world applications."

The above paper shares context and comparisons between naive, advanced, and modular RAG approaches. The application you will set up with this blog post leverages the "Naive RAG" approach.

Though RAG greatly improves LLM responses, it is but one tool in the broader LLM eco-system that you can leverage in your LLM-based applications.

There are other helpful methods you can use alongside RAG to improve your LLM's responses, like prompt engineering or fine-tuning. The application that you will set up with this blog post incorporates CometLLM  for you to (optionally) leverage additional prompt engineering over your application's system prompts and prompt templates – You can learn more by visiting Comet's quickstart guide.

An LLM with Your Knowledge

Get your local LLM-based application up-and-running in four easy steps!


Step #1 – Set Up the Repository

To start, clone my llm_hackathon repository from GitHub onto your machine. If you already have GitHub appropriately set up, you can use the following command to clone the repo:

Once you have the repository on your machine, you will then want to add a .env file to the top level of the repository. I've included a .env-TEMPLATE file in the repo as an example that you can reference.

You will then need to add your API keys from OpenAI and Comet to that file, both of which are free with a user sign-up.

Step #2Set Up the Dev Container

I love Docker "dev containers." They make it super easy to (1) reproduce a project and (2) develop the project further. For detailed instructions, check out the repository's README file.

Step #3 – Set Up the Vector Database

To set up your local vector database, first find some documents that you would like the LLM to have access to and copy them into a new docs folder under the ./llm_hackathon/knowledge_base directory of the repository. These can be files in any readable format, such as Markdown, PDFs, HTML, etc.

For example, I chose to copy the Markdown files from Comet's documentation  page into my docs folder. I currently work as an MLOps Engineer at Comet where I work with Data Science teams from all over the world to help them improve their MLOps tooling and practices. Because of this, I wanted an LLM-based application that used RAG like this so as to help me more easily answer nuanced customer questions about Comet! Kind of cool, right?

... okay enough of this guy.

Once you have your files copied into your new docs folder, you will need to...

1) clean and extract each document's raw data into a uniform plain text format,

2) segment the text into smaller chunks,

3) encode these chunks into vector representations using an embedding model,

4) and then store these embeddings into a vector database.


Does that sound like too much work? Well, I've made it easy for you (so long as your files are in Markdown):

If your files are in Markdown format, you can run the above command to index all of your files into a local Chroma vector database that will be stored under ./llm_hackathon/knowledge_base/rag_db. You can still use this command if your docs include other file formats, but you'll just need to revise the load_docs.py file a bit. 

The above steps are crucial for enabling efficient similarity searches in the retrieval phase of your LLM-based application.

Step #4 – Run the Application

If you followed the above steps correctly, all that remains is to now run your application with the following command!

After doing so, you should be able to visist the local URL http://localhost:8502 in your browser and see the following:

With the app now running on your machine, feel free to customize the interface by editing the app.py file however you would like!

Conclusion

It’s really easy to get an LLM-based application up and running like this. On the other hand, it is really hard to do it well.

Tools and methods like RAG and prompt engineering can greatly improve your LLM's responses. Not only that, but at Comet we're building some incredible tools to help you along the way. Stay tuned

If you found any of my content helpful, please consider donating
using one of the following options   Anything is appreciated!