Question about RAG Langchain Tutorial: isn’t embedding repeated?

Hi everyone,

I’m rather new to Streamlit and I’m trying to understand this tutorial: https://blog.streamlit.io/langchain-tutorial-4-build-an-ask-the-doc-app/

Do I understand correctly that the generate_response method is run on every form submission?

Wouldn’t that mean that the text is embedded and stored in a vector store on each new question that the user enters? That would be very inefficient, because isn’t the vector store there so I can store the embedding and don’t have to do the process (which costs both time and money potentially) every time?

I probably miss something, and I would appreciate if anyone could help me on my learning journey here…

Thank you!
Chris

You’re absolutely right in your understanding about the efficiency issue. The tutorial was made for the purpose of introducing beginners to build a simple app with less code so they can then venture and build up upon the basic app.

With the current tutorial code, generate_response method is indeed called on every form submission. So for each new question a user asks, the entire document is re-ingested, split into chunks, each chunk is embedded in Chroma even if it has not changed.

There are ways to mitigate this which would introduce more code to handle the complexity. The main idea would be to decouple the document ingestion pipeline from the Q&A/ retrieval process. Some code would assign id to documents uploaded and before each run, the code checks if it has been uploaded before and if true, it retrieves the vector embeddings of that document and passes it to the LLM for a response. If not present, then the initial pipeline is run end-to-end. This is just one of many way to solve this.