Summary
I have a LLM for summarization task. I provide an URL and a question and the model returns a summary of the text in the URL. I get weird behaviour when i provide the second URL, because streamlit somehow remembers my past questions/inputs and includes them (i speculate - because i cannot reproduce the error by running the program via CLI).
Steps to reproduce
first inquiry:
URL “Hannah Arendt - Wikipedia”, question: “who was she?”
returns summary: Hannah Arendt was … (Im happy with the result)
Second inquiry:
URL “Francisco Goya - Wikipedia”, question: “who was she?”
returns summary: Hannah Arendt was… (Im NOT happy, since the text is about fransisco goya and his wife! hence, she should be “wife”. I tried to run this inquiry first, and then everything works as expected: “she” is then understood as Goyas Wife.)
I do not understand what causes the problem, I have tried to clear caches and resources as seen in the code, but nothing works. (i have also tried to do it via the menu bar at the site, the only thing that works for me is to shut down the server and the start it again).
Code snippet:
import streamlit as st
from summarizer import llama_summarizer
# title of the app
st.title("Open source Llama for text summarization")
with st.sidebar:
retriever_opt = st.selectbox("Retriever", ("default", "SVM", "MultiQuery"))
device_opt = st.selectbox("Device", ("mps", "cpu", "cuda"))
model_opt = st.selectbox("Llama", ("summarizev2", "summarizev"))
embedding_opt = st.selectbox("Embedding model", ("large", "small"))
#with col1:
url = st.text_input("Enter URL to the text you want to summarize", placeholder="https://andersen.sdu.dk/vaerk/hersholt/TheUglyDuckling_e.html")
question = st.text_input("Enter a question to ask the model", placeholder="Summarize this text:")
if st.button("Summarize"):
# button trigger summarization
with st.spinner("Summarizing..."):
summarizer = llama_summarizer(url=url,
question=question,
retriever=retriever_opt,
device=device_opt,
model=model_opt,
embedding_model=embedding_opt
)
summarizer = (summarizer.scrape_text()
.split_text()
.instantiate_embeddings()
.instantiate_llm()
.instantiate_retriever()
.instantiate_qa_chain()
.generate()
)
summary = summarizer.answ['result'].strip()
box_height = int(len(summary) * 0.55)
st.text_area("Summary", value=summary, height=box_height, max_chars=None, key=None)
if st.button("Show text"):
with st.spinner("showing text"):
try:
text = url
except:
text = "No text scraped yet."
box_height = int(len(text) * 0.55)
st.text_area("Text", value=text, height=box_height, max_chars=None, key=None)
# Button to rerun the app (start from fresh)
if st.button("Start from Fresh"):
st.cache_data.clear()
st.cache_resource.clear()
st.experimental_rerun()
If applicable, please provide the steps we should take to reproduce the error or specified behavior.
Debug info
- Streamlit version: 1.26.0
- Python version: 3.11.4
- OS version: MacOS