Building your own RAG chatbot with Upstash
In this post, I talk about how I built an open-source Custom Content RAG Chatbot with Upstash Vector, Upstash Redis, Hugging Face Inference API, Replicate LLAMA-2-70B Chat model, and Vercel. Upstash Vector helped me to insert and query vectors, dynamically creating or updating relevant context for each user message, and Upstash Redis helped me to store the chatbot conversations.
Prerequisites
You'll need the following:
- Node.js 18 or later
- An Upstash account
- A Hugging Face account
- A Replicate account
- A Vercel Account
Tech Stack
| Technology | Description | | --- | --- | | Upstash | Serverless database platform. We're using both Upstash Vector and Upstash Redis for storing vectors and conversations respectively. | | Next.js | The React Framework for the Web. We’re using the populate shadcn/ui for rapid prototyping. | | Replicate | Run and fine-tune open-source models. We're using LLAMA-2-70B Chat model. | | Hugging Face | The platform where the machine learning community collaborates on models, datasets, and applications. We’re using Hugging Face Inference API for creating embeddings. | | LangChain | Framework for developing applications powered by language models. | | TailwindCSS | CSS framework for building custom designs. | | Vercel | A cloud platform for deploying and scaling web applications. |
Setting up Upstash Redis
Once you have created an Upstash account and are logged in you are going to go to the Redis tab and create a database.
After you have created your database, you are then going to the Details tab. Scroll down until you find the Connect your database section. Copy the content and save it somewhere safe.
Also, scroll down until you find the REST API section and select the .env button. Copy the content and save it somewhere safe.
Setting up Upstash Vector
Once you have created an Upstash account and are logged in you are going to go to the Vector tab and create an Index.
Also, scroll down until you find the Connect section and select the .env button. Copy the content and save it somewhere safe.
Setting up the project
To set up, just clone the app repo and follow this tutorial to learn everything that's in it. To fork the project, run:
Once you have cloned the repo, you are going to create a .env
file. You are going to add the items we saved from the above sections.
It should look something like this:
After these steps, you should be able to start the local environment using the following command:
Repository Structure
This is the main folder structure for the project. I have marked in red the files that will be discussed further in this post that deals with creating API Routes for chatting with AI trained on your custom context, and updating the context by upsert
-ing vectors into the existing index.
Setup Chat Route in Next.js App Router
In this section, we talk about how we’ve setup the route: app/api/chat/route.js
to sync the conversation in our serverless database, dynamically create embeddings of strings, query relevant vectors from a given index to create context, and requesting relevant predictions using LLAMA-2-70B Chat model. To simplify things, we’ll break this into further parts:
Storing Conversations
To cache the conversation taking place with Upstash Redis, we’ll make use of Redis Lists. As soon as a message comes in from a user to respond to, we conditionally push the the response from the chatbot (earlier) to the list. Then, we save the latest message from the user by pushing it to the list as well, and proceed to respond on it.
Create embedding of the latest message
To reply to the user’s latest message effectively in all the given context (i.e. the custom content user supplied), we’re going to create an embedding which’ll help us retrieve the relevant context (aka similar vectors) from the existing index. We’ll use Hugging Face Inference API with LangChain to create embeddings with just an API call on the edge and slice the obtained vector to the length we configured while spinning up the Upstash Vector Index (here, 256
).
Retrieve relevant context vectors based on the latest message
Dynamically fetching all the context supplied by the user per message is an expensive operation. We want to use only the context relevant to the user’s latest message, and pass it to the LLAMA-2-70B Chat model as the system prompt. To fetch only the relevant context, we query the existing set of vectors to obtain the 2 most relevant vectors including their metadata and filter the results where confidence scores are greater than 70%.
Prompt LLAMA-2-70B Chat model with context for predictions
Now that we’ve obtained the relevant context as a string, the final step is to prompt the llama-2-70B chat model for responding to the user’s latest message. We use the Vercel AI SDK’s experimental_buildLlama2Prompt
method which takes care of creating the suitable prompt format for llama-2-70B chat model.
Setup Train Route in Next.js App Router
In this section, we talk about how we’ve setup the route: app/api/train/route.js
to dynamically create embeddings of the strings passed in the request object, and add them into the Upstash Vector Index. To simplify things, we’ll break this into further parts:
Create embeddings of the strings
We’re going to create embeddings of the strings which’ll help us set or update the existing index. Doing so allows us to keep the context for chatbot’s future responses up to date. We’ll use Hugging Face Inference API with LangChain to create embeddings with just an API call on the edge.
Store vectors for relevance search
To add the generated embeddings to the vector index, we slice the obtained vectors to the length we configured while spinning up the Upstash Vector Index (here, 256
) and use the upsert
method to insert the embedding with the metadata, i.e. the strings themselves. This allows us to retrieve the strings when similar vectors are searched and therefore, set the knowledge base of the conversation while we call the LLAMA-2-70B Chat model to generate responses.
That was a lot of learning! You’re all done now ✨
Deploy to Vercel
The repository, is now ready to deploy to Vercel. Use the following steps to deploy 👇🏻
- Start by creating a GitHub repository containing your app's code.
- Then, navigate to the Vercel Dashboard and create a New Project.
- Link the new project to the GitHub repository you just created.
- In Settings, update the
Environment Variables
to match those in your local.env
file. - Deploy! 🚀
More Information
For more detailed insights, explore the references cited in this post.
| Resource | Link | | --- | --- | | GitHub Repo | https://github.com/rishi-raj-jain/custom-rag-chatbot-upstash-vector | | Hugging Face Inference API | https://huggingface.co/docs/api-inference/usage | | Replicate LLAMA-2-70B Chat | https://replicate.com/meta/llama-2-70b-chat | | LangChain & Hugging Face | https://js.langchain.com/docs/integrations/text_embedding/hugging_face_inference | | Vercel AI SDK for LLAMA-2-70B Context Creation | https://sdk.vercel.ai/docs/api-reference/prompts#experimental_buildllama2prompt |
Conclusion
In conclusion, this project has provided valuable experience in learning how to create embeddings, query from existing set of vectors, and use context to create relevant predictions using LLAMA-2-70B Chat model while using a service that scales with your need, i.e. Upstash.