How to Fix "UnhashableParamError" with Streamlit Cache When Passing a List of Documents?

Hi,
I’m building a Streamlit app that combines LangChain and Hugging Face for retrieval-augmented generation (RAG). I’m using u/st.cache_resource to cache expensive operations like setting up a retriever, but I’m running into the following error:

streamlit.runtime.caching.cache_errors.UnhashableParamError: Cannot hash argument 'documents' (of type builtins.list) in 'setup_retriever'.

This error occurs because documents is a list of LangChain Document objects, which are unhashable. Streamlit seems to have trouble caching functions that take such arguments.
Here’s the relevant part of my code:

@st.cache_resource
def setup_retriever(documents):
    embeddings_model = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
    db = Chroma.from_documents(
        documents=documents,
        embedding=embeddings_model,
        persist_directory="chroma_storage"
    )
    return db.as_retriever()
# Call the function
retriever = setup_retriever(documents)

I know that Streamlit doesn’t support caching unhashable types like lists. However, I still want to use caching for performance reasons. How can I resolve this?

You can create your own hashing function and pass that to cache_resource(): st.cache_resource - Streamlit Docs

Arguments need to be hashable to generate a cache, however, lists holding “simple” objects are casted as tuples, which are immutable and hashable. For example:

Works fine:

import streamlit as st
from time import sleep

@st.cache_resource
def process_data(data: list[str]):
    sleep(2)  # <-- Mimic some long process
    return [x.upper() for x in data]


st.title("Caching")

my_data = ["one", "two", "three"]
processed_data = process_data(my_data)
st.write(processed_data)

In your case, it’s more about defining a hash for the LangChain Document objects rather than a hashing for lists.

Completing the example:

Does not work:

import streamlit as st
from time import sleep


class Unhashable:
    def __init__(self, name: str):
        self.name = name

    def __hash__(self):
        raise NotImplementedError("Ups, no hashing")

@st.cache_resource
def fn_complex_data(data: list[Unhashable]):
    sleep(2)  # <-- Mimic some long process
    return [x.name.upper() for x in data]


st.title("Caching")
complex_data = [Unhashable("one"), Unhashable("two"), Unhashable("three")]
processed_complex_data = fn_complex_data(complex_data)
st.write(processed_complex_data)

Defining a custom hashing function to deal with lists of Unhashable objects:

Does work:

hash_funcs = {
    list: lambda x: hash(tuple(getattr(obj, "name", obj) for obj in x))
}

@st.cache_resource(hash_funcs=hash_funcs)
def fn_complex_data(data: list[Unhashable]):
    sleep(2)
    return [x.name.upper() for x in data]
1 Like