I have a LLM for summarization task. I provide an URL and a question and the model returns a summary of the text in the URL. I get weird behaviour when i provide the second URL, because streamlit somehow remembers my past questions/inputs and includes them (i speculate - because i cannot reproduce the error by running the program via CLI).
URL “Hannah Arendt - Wikipedia”, question: “who was she?”
returns summary: Hannah Arendt was … (Im happy with the result)
URL “Francisco Goya - Wikipedia”, question: “who was she?”
returns summary: Hannah Arendt was… (Im NOT happy, since the text is about fransisco goya and his wife! hence, she should be “wife”. I tried to run this inquiry first, and then everything works as expected: “she” is then understood as Goyas Wife.)
I do not understand what causes the problem, I have tried to clear caches and resources as seen in the code, but nothing works. (i have also tried to do it via the menu bar at the site, the only thing that works for me is to shut down the server and the start it again).
import streamlit as st from summarizer import llama_summarizer # title of the app st.title("Open source Llama for text summarization") with st.sidebar: retriever_opt = st.selectbox("Retriever", ("default", "SVM", "MultiQuery")) device_opt = st.selectbox("Device", ("mps", "cpu", "cuda")) model_opt = st.selectbox("Llama", ("summarizev2", "summarizev")) embedding_opt = st.selectbox("Embedding model", ("large", "small")) #with col1: url = st.text_input("Enter URL to the text you want to summarize", placeholder="https://andersen.sdu.dk/vaerk/hersholt/TheUglyDuckling_e.html") question = st.text_input("Enter a question to ask the model", placeholder="Summarize this text:") if st.button("Summarize"): # button trigger summarization with st.spinner("Summarizing..."): summarizer = llama_summarizer(url=url, question=question, retriever=retriever_opt, device=device_opt, model=model_opt, embedding_model=embedding_opt ) summarizer = (summarizer.scrape_text() .split_text() .instantiate_embeddings() .instantiate_llm() .instantiate_retriever() .instantiate_qa_chain() .generate() ) summary = summarizer.answ['result'].strip() box_height = int(len(summary) * 0.55) st.text_area("Summary", value=summary, height=box_height, max_chars=None, key=None) if st.button("Show text"): with st.spinner("showing text"): try: text = url except: text = "No text scraped yet." box_height = int(len(text) * 0.55) st.text_area("Text", value=text, height=box_height, max_chars=None, key=None) # Button to rerun the app (start from fresh) if st.button("Start from Fresh"): st.cache_data.clear() st.cache_resource.clear() st.experimental_rerun()
If applicable, please provide the steps we should take to reproduce the error or specified behavior.
- Streamlit version: 1.26.0
- Python version: 3.11.4
- OS version: MacOS