Build a Multimodal RAG with Gemma 3, LangChain and Streamlit

In this video, we will build a Multimodal RAG (Retrieval-Augmented Generation) system using Google’s Gemma 3, LangChain, and Streamlit to chat with PDFs and answer complex questions about your local documents — even about its images and tables! I will guide you step by step in setting up Ollama’s Gemma 3 LLM model, integrating it with a LangChain-powered RAG, and then showing you how to use a simple Streamlit interface so you can query your PDFs in real time. If you’re curious about the new Gemma 3 model, or how to build RAGs that even support images and tables, this tutorial is for you.

You can watch it here: https://youtu.be/hBDNv47KCKo

You can find the source code here: https://github.com/NarimanN2/ollama-playground