Multi-Modal RAG ChatBot: Your AI-Powered Knowledge Assistant (Streamlit + MindsDB + LangChain + FAISS)

Guna_Sekhar_Venkata · July 9, 2024, 12:51pm

My Quira Quest 14 submission
The Multi-Modal RAG ChatBot is an innovative application designed to enhance your knowledge retrieval experience using PDFs and YouTube videos. Our chatbot provides seamless access to relevant text, images, and video frames based on your queries.

Features:
Multi-Modal Retrieval : Instantly fetches text, images, and video frames from static PDFs and YouTube videos to answer your queries.
Nice UI for User Interaction : Enjoy a user-friendly interface that makes interacting with the chatbot smooth and intuitive.

Future Enhancements:
Dynamic Multi-Modal RAG: Addressing the high computational challenge of creating a multi-modal vector database for dynamic data.
On-Device Privacy : Ensuring data never leaves your device for complete privacy and security.
Knowledge Graph Without LLMs: Moving towards a knowledge graph-based approach without relying on LLMs.
Open Source Collaboration : Encouraging contributions to push the boundaries of machine learning and privacy-centric technology.
On-Device GPUs Access: Ensuring for creating advacned knoweldge graphs without relying on any cloud etc.

For those interested in using their own resources, we have provided comprehensive Colab tutorials to get you started.
Links:
GitHub Repo (give it a star )

Quira Voting Link (Requesting everyone with valuable feedback and suggestions for future improvements…!)

Detailed YouTube Video (If you like the demo, please like, share and subscribe video…!)

Live WebApp :- https://mutli-modal-rag-chabot.streamlit.app/

@SiddhantSadangi

I’m requesting everyone to provide feedback and suggestions on my work

asehmi · July 10, 2024, 9:24am

I think this needs to have function calling (tools) for internet access. I asked a Q and got a wild guess (non-contextual) A… it didn’t know about RAG.

Q: What is zero-shot prompting and how can it be used for RAG?
A: Zero-shot prompting is the simplest technique used with chatbots where they are asked to answer a question or perform a task without being provided with any examples. This technique does not include any examples in the prompt, relying on the chatbot’s knowledge to generate accurate responses. Zero-shot prompting is advantageous as it requires minimal effort, is accessible to everyone without technical expertise, and can be applied to various tasks with versatility. However, it has limitations, such as limited performance compared to other prompting techniques and a reliance on carefully crafted prompts for optimal results.

In terms of using zero-shot prompting for RAG (Red, Amber, Green) analysis, chatbots can be prompted with questions related to assessing the status of a project, task, or any other relevant metric using the RAG rating system without providing specific examples. The chatbot would then generate responses indicating whether the status is red (critical issues), amber (caution), or green (on track). This allows for quick assessments without the need for extensive input examples, making it a convenient method for analyzing and monitoring various indicators in a straightforward manner.

Guna_Sekhar_Venkata · July 10, 2024, 9:36am

@asehmi but we can’t get the relevant image from function calling right. My intention is to get the relevant frames from the video for better quality of the answer. For example i asked one question in the pdf and returned 90 percentage of question correct and 10 percentage wrong. But interesting part was it returned the accurate image of table relevant to the question where I get the perfect satisfactory answer. So sometimes we can’t believe on text answers right. Also i mentioned the dynamic Multi-Modal rag needs an strategic approach for building the robust multi modal vector data bases for better retrieval.

Here is the proof that why we can’t believe on text always.

asehmi · July 10, 2024, 10:00am

I guess one can ask more precise questions if one knows the content uploaded. I’ve recently explored Agentic AI techniques and the problem of quality assurance and accuracy is often cited as a reason to use agents; some doing data extraction, others doing understanding, and others doing review and verification, before compiling the final report… perhaps with yet another agent. The agents can be given tools to assist in these tasks. (Caveat: Unless you’re using a local LLM, Agentic AI will get expensive pretty quickly because of the multiple agent collaborations and potentially many LLM calls as the intermediate results are refined.)

Guna_Sekhar_Venkata · July 10, 2024, 10:42am

Yes your right @asehmi . Recently I have developed an project that it will gives the top headline news summary article of the user interested country in live. Because if you want get the summary of top live news article headline, we need to use more than one application right. For that i have used two agents . One agent crawls the entire web and gives the top healing article url of the specified country in the live and another agents takes the url and gives the summary of that. I developed the entire solution using @fetch.ai agents framework. I’m highly suggesting you to open the website deltav.
I’m also attaching the demo for better understanding. Have a look on it and let me know any suggestions and feedback on it.

Demo

asehmi · July 11, 2024, 9:13am

I think many use cases, like in the deltav demo, are not good use cases for agentic AI. Sure, you might be able to solve the problem with agents, but they are much more easily solved with less end-user friction using standard techniques. Agents are ideally suited to non-deterministic, fuzzy and unstructured problem solving use cases. Call me a purist, but it’s early days in this space and so many demo apps are really only proving their frameworks are capable of building agent apps, but aren’t actually doing this by solving a difficult fuzzy problem in an innovative way.

Guna_Sekhar_Venkata · July 11, 2024, 9:37am

yes, im agreeing with you @asehmi

SiddhantSadangi · July 15, 2024, 4:05am

Not sure why I am tagged here

Guna_Sekhar_Venkata · July 15, 2024, 5:40am

the reason for tagging is that you can check the project and give any improvements or suggestions in your free time.

system · January 11, 2025, 5:40am

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
A Quick Update on my RAG Project + What I'm Learning Next! Show the Community! python-programming , discussion , streamlit , rag	0	47	July 26, 2025
Build a Multimodal RAG with Gemma 3, LangChain and Streamlit Show the Community! llms , pdf	2	824	March 31, 2025
🚀 Introducing my latest application, Docu Talk Show the Community! streamlit-cloud , discussion	6	351	March 20, 2025
Ollama RAG & Deep Research App Show the Community! llms , research	0	428	March 25, 2025
Excel Chatbot Built with Streamlit: Better Than a Vanilla LLM? Show the Community! llms , chatbot	3	420	April 29, 2025

Multi-Modal RAG ChatBot: Your AI-Powered Knowledge Assistant (Streamlit + MindsDB + LangChain + FAISS)

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies