Undesirable behavior where all actions trigger an app refresh and the question and answer process is rerun

Summary

Dear community,

I am encountering an issue with my Streamlit app where every action on the app triggers the user question to be sent and a response to be returned. I have provided more details on the expected and actual behaviour in the respective sections. Just as an overview, the app allows users to upload documents to a vector store and ask questions about the documents in the vector store leveraging ChatGPT.

Steps to reproduce

Code snippet:

import streamlit as st
from dotenv import load_dotenv
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.memory import ConversationBufferMemory
from langchain.embeddings import OpenAIEmbeddings
from langchain.chains import ConversationalRetrievalChain
from langchain.chat_models import AzureChatOpenAI
import openai
from langchain.vectorstores import Qdrant
import qdrant_client
import os
from htmlTemplates import css, bot_template, user_template

openai.log = "debug"

client = qdrant_client.QdrantClient(os.getenv("QDRANT_HOST"), api_key=os.getenv("QDRANT_API_KEY"))

def get_pdf_text(pdf_docs):
    text = ""
    for pdf in pdf_docs:
        pdf_reader = PdfReader(pdf)
        for page in pdf_reader.pages:
            text += page.extract_text()
    return text

def get_text_chunks(text):
    text_splitter = CharacterTextSplitter(
        separator="\n",
        chunk_size=1000,
        chunk_overlap=200,
        length_function=len
    )
    chunks = text_splitter.split_text(text)
    return chunks

def get_vectorstore(text_chunks):
    embeddings = OpenAIEmbeddings(
        chunk_size=1,
        deployment="embeddings-ada",
        model="text-embedding-ada-002",
    )
    vectorstore = Qdrant(client=client, collection_name=os.getenv("QDRANT_COLLECTION_NAME"), embeddings=embeddings)
    return vectorstore

def process_pdfs(pdf_docs, vectorstore):
    raw_text = get_pdf_text(pdf_docs)
    text_chunks = get_text_chunks(raw_text)
    vectorstore.add_texts(text_chunks)

def initialize_conversation_chain(vectorstore):
    llm = AzureChatOpenAI(
        deployment_name="gpt35turbo",
        model_name="gpt-35-turbo"
    )
    memory = ConversationBufferMemory(
        memory_key='chat_history', return_messages=True)
    conversation_chain = ConversationalRetrievalChain.from_llm(
        llm=llm,
        retriever=vectorstore.as_retriever(),
        memory=memory
    )
    return conversation_chain

def handle_userinput(user_question):
    response = st.session_state.conversation({'question': user_question})
    st.session_state.chat_history = response['chat_history']

    for i, message in enumerate(st.session_state.chat_history):
        if i % 2 == 0:
            st.write(user_template.replace(
                "{{MSG}}", message.content), unsafe_allow_html=True)
        else:
            st.write(bot_template.replace(
                "{{MSG}}", message.content), unsafe_allow_html=True)

def main():
    load_dotenv()
    st.set_page_config(page_title="AML Policies and Procedures",
                       page_icon=":books:")
    st.write(css, unsafe_allow_html=True)

    vectorstore = get_vectorstore([])  # Initialize with an empty list
    if "conversation" not in st.session_state:
        st.session_state.conversation = initialize_conversation_chain(vectorstore)

    if "chat_history" not in st.session_state:
        st.session_state.chat_history = None

    st.header("AML Policies and Procedures  :books:")
    user_question = st.text_input("Ask a question about your documents:")
    if user_question:
        handle_userinput(user_question)

    with st.sidebar:
        st.subheader("Your documents")
        pdf_docs = st.file_uploader(
            "Upload your PDFs here and click on 'Process'", accept_multiple_files=True)
        if st.button("Process"):
            with st.spinner("Processing"):
                process_pdfs(pdf_docs, vectorstore)

if __name__ == '__main__':
    main()

If applicable, please provide the steps we should take to reproduce the error or specified behavior.

Expected behavior:

What I am expecting is for the question to be sent only when the user hits the enter key, at which point a response is displayed.

Actual behavior:

What is actually happening is that every action on the app causes the Q & A sequence to happen. For example, if I click on the button to browse for documents, the question in the textbox is sent again and an answer is regenerated. The same thing happens when I click on the process button, which vectorizes the PDF and saves it in my cloud DB. Every action on the app makes the Q&A process run and this is unideal as every API call has a cost and the users will see duplicate question and answer pairs. My hunch is that I am handling the session state wrong. I simply want the conversation chain to be displayed on the app.

Debug info

  • Streamlit version:1.24.1
  • Python version: 3.10.9
  • Using Conda? PipEnv? PyEnv? Pex?
  • OS version: Windows 11
  • Browser version: Edge

Requirements file

Using Conda? PipEnv? PyEnv? Pex? Share the contents of your requirements file here.
Not sure what a requirements file is? Check out this doc and add a requirements file to your app.

langchain==0.0.228

PyPDF2==3.0.1

python-dotenv==1.0.0

streamlit==1.24.1

openai==0.27.8

qdrant-client==1.4.0

tiktoken==0.4.0

Links

  • Link to your GitHub repo:
  • Link to your deployed app:

Additional information

If needed, add any other context about the problem here.

One quick fix is to put the user input and process pdfs into forms, so that they don’t rerun unless you hit the “submit” button

import streamlit as st
from datetime import datetime
import time


def handle_userinput(user_question):
    st.write(f"Your question: {user_question}, {datetime.now()}")


def process_pdfs(pdf_docs):
    for pdf_doc in pdf_docs:
        st.write(pdf_doc.name)
        time.sleep(1)


st.header("AML Policies and Procedures  :books:")
with st.form("Question"):
    user_question = st.text_input("Ask a question about your documents:")
    if st.form_submit_button("Ask"):
        handle_userinput(user_question)

with st.sidebar:
    st.subheader("Your documents")
    with st.form("Documents"):
        pdf_docs = st.file_uploader(
            "Upload your PDFs here and click on 'Process'", accept_multiple_files=True
        )
        if st.form_submit_button("Process"):
            with st.spinner("Processing"):
                process_pdfs(pdf_docs)
1 Like

The main script is run from top to bottom each time the user interacts with an input widget.

In handle_userinput() you are mixing thing that you want to do on each rerun (display the history) and things that you want to do only when there is a new question (get a response, update the history)

So if you call handle_userinput() on each rerun, like you are doing now, you are doing it wrong: you are requesting a response and updating the history even if there is not a new question.

But if you call handle_userinput() only when there is a new question, you are doing it wrong too: the history won’t be displayed when there are other interactions.

So don’t put all together in that function. Getting the response and updating the history should be done only when there is a new question (in a callback). Displaying the history should be done on each rerun, pretty much like you are doing it now.

1 Like

@blackary

Thank you, This is already very helpful as I am no longer getting duplicate Q&A pairs. The only slight issue is that the Q&A chain disappears once the process button is hit. But when I ask a new question, the previous history appears. I don’t think this is a big deal though as I am planning to split the document processing and chat functionalities into separate app sections, and users will be expected to upload docs before they go to chat.

Ideally, I would like the Q&A sequence to happen once the user hits the enter key, versus clicking on a button. Do you know if this is possible?

@Goyo Makes some good suggestions – here’s a slightly modified version where displaying the questions is isolated from the text input itself, and the text input uses on_change to submit a new question when it’s changed rather than using a submit button.

import streamlit as st
import time

if "questions" not in st.session_state:
    st.session_state.questions = []


def handle_userinput():
    user_question = st.session_state.question
    st.session_state.questions.append(user_question)


def process_pdfs(pdf_docs):
    for pdf_doc in pdf_docs:
        st.write(pdf_doc.name)
        time.sleep(1)


st.header("AML Policies and Procedures  :books:")
user_question = st.text_input(
    "Ask a question about your documents:", on_change=handle_userinput, key="question"
)

st.write("Questions")
st.write(st.session_state.questions)

with st.sidebar:
    st.subheader("Your documents")
    with st.form("Documents"):
        pdf_docs = st.file_uploader(
            "Upload your PDFs here and click on 'Process'", accept_multiple_files=True
        )
        if st.form_submit_button("Process"):
            with st.spinner("Processing"):
                process_pdfs(pdf_docs)
1 Like

Many thanks @Goyo and @blackary!

I will work towards implementing the recommended fixes on Sunday and let you know how it goes.

I tried implementing changes to split the new question and conversation history, but I was not getting the desired results. I will use the button solution now and probably revisit things in the future.

Thanks for the help!

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.