Hi,
I’m building a Streamlit app that combines LangChain and Hugging Face for retrieval-augmented generation (RAG). I’m using u/st.cache_resource to cache expensive operations like setting up a retriever, but I’m running into the following error:
streamlit.runtime.caching.cache_errors.UnhashableParamError: Cannot hash argument 'documents' (of type builtins.list) in 'setup_retriever'.
This error occurs because documents is a list of LangChain Document objects, which are unhashable. Streamlit seems to have trouble caching functions that take such arguments.
Here’s the relevant part of my code:
@st.cache_resource
def setup_retriever(documents):
embeddings_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
db = Chroma.from_documents(
documents=documents,
embedding=embeddings_model,
persist_directory="chroma_storage"
)
return db.as_retriever()
# Call the function
retriever = setup_retriever(documents)
I know that Streamlit doesn’t support caching unhashable types like lists. However, I still want to use caching for performance reasons. How can I resolve this?
import streamlit as st
from time import sleep
@st.cache_resource
def process_data(data: list[str]):
sleep(2) # <-- Mimic some long process
return [x.upper() for x in data]
st.title("Caching")
my_data = ["one", "two", "three"]
processed_data = process_data(my_data)
st.write(processed_data)
In your case, it’s more about defining a hash for the LangChain Document objects rather than a hashing for lists.
import streamlit as st
from time import sleep
class Unhashable:
def __init__(self, name: str):
self.name = name
def __hash__(self):
raise NotImplementedError("Ups, no hashing")
@st.cache_resource
def fn_complex_data(data: list[Unhashable]):
sleep(2) # <-- Mimic some long process
return [x.name.upper() for x in data]
st.title("Caching")
complex_data = [Unhashable("one"), Unhashable("two"), Unhashable("three")]
processed_complex_data = fn_complex_data(complex_data)
st.write(processed_complex_data)
Defining a custom hashing function to deal with lists of Unhashable objects:
Does work:
hash_funcs = {
list: lambda x: hash(tuple(getattr(obj, "name", obj) for obj in x))
}
@st.cache_resource(hash_funcs=hash_funcs)
def fn_complex_data(data: list[Unhashable]):
sleep(2)
return [x.name.upper() for x in data]
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.