Spacy span class object not hashable

Hi Guys,

can anyone help me with st.caching of spacy span object?

I get the following error:

**UnhashableTypeError** : Cannot hash object of type  `spacy.tokens.span.Span` , found in the arguments of  `xyz()` .

While caching the arguments of  `xyz()` , Streamlit encountered an object of type  `spacy.tokens.span.Span` , which it does not know how to hash.

To address this, please try helping Streamlit understand how to hash that type by passing the  `hash_funcs`  argument into  `@st.cache` . For example:

@st.cache(hash_funcs={spacy.tokens.span.Span: my_hash_func})
def my_func(...):

I got no idea how hashing etc. works and thus, i do not know how to implement this function.
Help is appreciated.
Cheers Chris

What streamlit does when caching is hash (a fancy way of taking an input of arbitrary length and getting an output of fixed length that very rarely self collides) the input arguments as well as the code that makes up the function itself. If those arguments or the function is the same as the last time the cached function was run, then the hashes are the same and Streamlit assumes that the output to the function didn’t change. It also pays attention and gets unhappy when you change the cached variables in the function from outside the function.

Unfortunately, I’m not super familiar with Spacy. Could you help describe better what the type spacy.tokens.span.Span is?

If you want a couple options that will just make the error go away, you could try @st.cache(allow_output_mutation=True) which will disable completely hashing the output (this is really only helpful for when you are trying to expose global variables by reference to the entire streamlit runtime for multiple connected users) but should still cache and rerun if the inputs change.

An alternative would be to just not hash this type of object @st.cache(hash_funcs={spacy.tokens.span.Span: lambda _: None}). Note that this means that if the only thing that changes is this spacy.tokens.span.Span object, your cached function will not rerun. It seems like this spacy.tokens.span.Span object is complex enough though that there are probably other simpler variables changing around it.

You could also extract out a simpler subproperty of the spacy.tokens.span.Span object and assign it to some variable that streamlit does know how to hash. When that property changes, the cache function will miss and rerun.

Below is a good reference in the documentation (where all of my information actually came from).