RAG Chatbot – MoviesGPT

What is RAG?

Retrieval-Augmented Generation or RAG is when you change the output of a Large Language Model (LLM) by providing the model more context alongside a user’s input. That way, the model can use its ability to generate text along with extra context to provide accurate answers to users’ questions

Why is RAG useful?

  • Cost Effective
  • Models have cut-off dates, after which knowledge isn’t updated.
  • Covers up for information that does not exist

What is Vector Embedding?

A popular technique to represent information in a format that algorithms, especially deep learning models, can easily process. This ‘information’ can be text, pictures, video or audio.

Step-by-step workflow of MoviesGPT

Data Collection (Wikipedia Scraping)

  • The project uses Puppeteer (via LangChain) to scrape Wikipedia pages containing lists of movies in various Indian languages for the year 2025.
  • Each Wikipedia page’s content is fetched and cleaned of HTML tags.

Text Chunking

  • The scraped content is split into manageable chunks using a text splitter (RecursiveCharacterTextSplitter).
  • This ensures each chunk is of optimal size for embedding and storage.

Embedding Generation

  • Each text chunk is sent to NVIDIA’s embedding API (nvidia/nv-embedqa-e5-v5 model) to generate a high-dimensional vector representation.
  • These embeddings capture the semantic meaning of each chunk.

Database Storage (AstraDB)

  • The vector embeddings and their corresponding text chunks are stored in AstraDB, a vector database.
  • The database is set up to support efficient similarity search using the chosen metric (e.g., dot product).

User Interaction (Frontend)

  • Users interact with a chat interface built with Next.js.
  • When a user submits a question, it is sent to the backend API.

Query Embedding & Context Retrieval

  • The backend generates an embedding for the user’s question using the same NVIDIA model.
  • It then queries AstraDB for the most similar text chunks (context) based on vector similarity to the question embedding.

Prompt Construction

  • The retrieved context is formatted and combined with the user’s question to create a system prompt.
  • This prompt instructs the AI to use the provided context to answer the question, but to fall back on its own knowledge if needed.

AI Response Generation

  • The prompt and chat history are sent to OpenRouter’s chat API (using a model like deepseek/deepseek-chat).
  • The AI generates a streaming response, which is sent back to the frontend in real time.

User Receives Answer

  • The user sees the AI’s answer in the chat interface, formatted in markdown for readability.

Workflow Diagram (Textual)

Wikipedia Pages

[Scraping & Cleaning]

[Text Chunking]

[Embedding Generation]

[AstraDB Storage]

(User asks a question)

[Question Embedding]

[Vector Search in AstraDB]

[Relevant Context Retrieved]

[Prompt Construction]

[OpenRouter AI Chat Completion]

[Streaming Response to User]

Summary

  • Backend: Handles scraping, embedding, storage, and retrieval.
  • Frontend: Provides a chat interface for users using NextJS via TypeScript.
  • AI Models: NVIDIA for embeddings, OpenRouter for chat.
  • Database: AstraDB for vector search and storage.

This workflow ensures that MoviesGPT can answer movie-related questions with up-to-date, contextually relevant information, providing a seamless and intelligent user experience.

Links

GitHub: GitHub
Demo: Demo

Similar Posts