RAG Chatbot - MoviesGPT

What is RAG?

Retrieval-Augmented Generation or RAG is when you change the output of a Large Language Model (LLM) by providing the model more context alongside a user’s input. That way, the model can use its ability to generate text along with extra context to provide accurate answers to users’ questions

Why is RAG useful?

Cost Effective
Models have cut-off dates, after which knowledge isn’t updated.
Covers up for information that does not exist

What is Vector Embedding?

A popular technique to represent information in a format that algorithms, especially deep learning models, can easily process. This ‘information’ can be text, pictures, video or audio.

Step-by-step workflow of MoviesGPT

Data Collection (Wikipedia Scraping)

The project uses Puppeteer (via LangChain) to scrape Wikipedia pages containing lists of movies in various Indian languages for the year 2025.
Each Wikipedia page’s content is fetched and cleaned of HTML tags.

Text Chunking

The scraped content is split into manageable chunks using a text splitter (RecursiveCharacterTextSplitter).
This ensures each chunk is of optimal size for embedding and storage.

Embedding Generation

Each text chunk is sent to NVIDIA’s embedding API (nvidia/nv-embedqa-e5-v5 model) to generate a high-dimensional vector representation.
These embeddings capture the semantic meaning of each chunk.

Database Storage (AstraDB)

The vector embeddings and their corresponding text chunks are stored in AstraDB, a vector database.
The database is set up to support efficient similarity search using the chosen metric (e.g., dot product).

User Interaction (Frontend)

Users interact with a chat interface built with Next.js.
When a user submits a question, it is sent to the backend API.

Query Embedding & Context Retrieval

The backend generates an embedding for the user’s question using the same NVIDIA model.
It then queries AstraDB for the most similar text chunks (context) based on vector similarity to the question embedding.

Prompt Construction

The retrieved context is formatted and combined with the user’s question to create a system prompt.
This prompt instructs the AI to use the provided context to answer the question, but to fall back on its own knowledge if needed.

AI Response Generation

The prompt and chat history are sent to OpenRouter’s chat API (using a model like deepseek/deepseek-chat).
The AI generates a streaming response, which is sent back to the frontend in real time.

User Receives Answer

The user sees the AI’s answer in the chat interface, formatted in markdown for readability.

Workflow Diagram (Textual)

Wikipedia Pages
↓
[Scraping & Cleaning]
↓
[Text Chunking]
↓
[Embedding Generation]
↓
[AstraDB Storage]
↓
(User asks a question)
↓
[Question Embedding]
↓
[Vector Search in AstraDB]
↓
[Relevant Context Retrieved]
↓
[Prompt Construction]
↓
[OpenRouter AI Chat Completion]
↓
[Streaming Response to User]

Summary

Backend: Handles scraping, embedding, storage, and retrieval.
Frontend: Provides a chat interface for users using NextJS via TypeScript.
AI Models: NVIDIA for embeddings, OpenRouter for chat.
Database: AstraDB for vector search and storage.

This workflow ensures that MoviesGPT can answer movie-related questions with up-to-date, contextually relevant information, providing a seamless and intelligent user experience.

Links

GitHub: GitHub
Demo: Demo

🎬 Watch the Video

Security news weekly round-up – 25th July 2025

ByAdil 26/07/2025

We have 5 articles to review in this week’s edition. All 5 articles cover different threats that can affect me and you. Some are popular, e.g. malware, phishing, and vulnerabilities. While others— privacy issues surrounding the use of AI and browser-based cryptojacking attacks— not so much. For privacy and security, think twice before granting AI…

Tech News

LearnSync – A Unified Knowledge Explorer

ByAdil 26/07/2025

This is a submission for the Algolia MCP Server Challenge What I Built LearnSync is a unified search engine for developers that aggregates and ranks learning content from three major platforms: DEV.to articles GitHub repositories YouTube tutorials Whether you’re diving into a new framework or exploring a deep-dive into Web3, LearnSync brings the best learning…

Tech News

Clipper: Orchestrating Amazon Q with Algolia MCP for Read-Later Link Management

ByAdil 26/07/2025

This is a submission for the Algolia MCP Server Challenge What I Built I’ve created a CLI agent powered by Amazon Q that indexes your links (like articles or blog posts) and allows you to retrieve them using natural language, all powered by Algolia MCP. This project offers a creative approach to building a useful…

Hasbro Star Wars The Vintage Collection LAAT/I Gunship

Tech News

Exclusive: Hasbro’s next Star Wars HasLab project is a massive, detailed LAAT/I ship from ‘The Clone Wars’ – here’s your first look

ByAdil 26/07/2025

Hasbro’s Star Wars: The Vintage Collection has been home to many iconic venues, characters, and vessels from the ever-expanding universe. But for San Diego Comic Con 2025, it’s safe to say that the Hasbro team has taken things up another notch, especially if you’re a fan of the Clone Wars. And as someone who has…

Tech News

3D printed weapon templates have gone mainstream – and it’s as bad as you think

ByAdil 26/07/2025

Police are now tracing ghost guns by identifying microscopic toolmarks left by 3D printer hardware Thingiverse is using automation to block the upload of dangerous weapon blueprint files Lawmakers want 3D printer makers to build in AI tools that block firearm printing The rapid spread of 3D-printed gun templates online is drawing scrutiny from law…

Tech News

Control storage access

ByAdil 25/07/2025

Create a storage container Login to Microsoft Azure at https://portal.azure.com Select storage accounts under services. Select the storage account you created in the Prepare exercise. Select and Add container Container Name is storage-container and then select create Upload a file to the storage container Click on the container you just created and select upload Once…

What is RAG?

Why is RAG useful?

What is Vector Embedding?

Step-by-step workflow of MoviesGPT

Data Collection (Wikipedia Scraping)

Text Chunking

Embedding Generation

Database Storage (AstraDB)

User Interaction (Frontend)

Query Embedding & Context Retrieval

Prompt Construction

AI Response Generation

User Receives Answer

Workflow Diagram (Textual)

Summary

Links

Similar Posts