Alternative to hash_funcs in caching

BramVanroy · March 6, 2023, 2:27pm

Hello all

I have been updating my apps to recent streamlit versions. There, the old cache has been deprecated in favor of caching resources and data. Unfortunately, this seems to imply that hash_funcs are also deprecated. I find this very unfortunate and I wonder whether an alternative is even possible or whether the deprecation means that I won’t be able to use caching for my function in the future.

import streamlit as st

class Predictor:
    def __init__(self, model_name):
        self.model_name = model_name
        self.model = get_model(model_name)

@st.cache_data(show_spinner=False)
def predict(query: str, predictor: Predictor):
    # ... some code to use predictor to predict with query as input data that returns a prediction
    # While predictor might not be hashable, a hashfunc would be very useful
    # because then we can simply specify {Predictor: lambda predictor: predictor.model_name}
    # because the model_name is the distinguishing factor


@st.cache_resource(show_spinner=False)
def get_model(model_name: str, no_cuda: bool = False):
    # ... some pytorch code to return a model specific to model_name

I do not see how I can use the cache in predict with the current implementation because I cannot use hash_funcs. I canot simply ignore the predictor variable because of course different predictors/models should return different results. I could easily distinguish between them with hash_funcs by using their model_name property.

What would be the current recommended way of using cache in the predict function above? Note that Predictor actually has a lot of input arguments so I would not like to have the Predictor init inside the predict function.

Franky1 · March 6, 2023, 2:48pm

I don’t know if this is the official solution, but it is more a hack that came to my mind:

@st.cache_data(show_spinner=False)
def predict(query: str, _predictor: Predictor, model_name: str):
    # _predictor: disables hashing for this argument
    # model_name: just pass predictor.model_name and therefore make the call unique?

BramVanroy · March 6, 2023, 3:00pm

I see what you mean, thanks. That would work but is of course not optimal. (And linters won’t like it - unused argument.) Was hoping there was a better solution but thanks for the suggestion.

If a more optimal/official solution is available from others I am glad to hear it.

Caroline · March 9, 2023, 10:31pm

Hey @BramVanroy, just wanted to share this thread with you where the team discussed a similar issue, in case it’s useful

jcarroll · March 11, 2023, 12:58am

Hey @BramVanroy I did a writeup on this here:

github.com/streamlit/streamlit

Support custom hashing functions for st.cache_data and st.cache_resource

opened 12:55AM - 11 Mar 23 UTC

sfc-gh-jcarroll

type:enhancement

## TL;DR This issue is for adding `hash_funcs` support to `@st.cache_data` an…d `@st.cache_resource`, similar to the prior support on `@st.cache`. It also describes the problem and solutions that work today in some detail, as a reference. If you need or want this support, please upvote and feel free to describe your use case if it isn't already covered. ## Problem Streamlit's caching functions hash the input parameters and cached function's signature to determine whether the function has been run before and has a return value stored ("cache hit") or needs to be run ("cache miss"). See [Caching docs](https://docs.streamlit.io/library/advanced-features/caching). Input parameters that are not hashable can be ignored by prefixing the name with `_`. But there isn't a built-in way to include these parameters in the decision to return a cached value or not. Previously, this was supported with the now deprecated [st.cache](https://docs.streamlit.io/library/api-reference/performance/st.cache)'s `hash_funcs` parameter. **Totally broken code** ```python import streamlit as st class MyBrokenClass: def __init__(self, initial_score: int): self.my_score = initial_score # this throws an UnhashableParamError since `self` isn't hashable @st.cache_data def multiply_score(self, x: int) -> int: return self.my_score * x c = MyBrokenClass(5) st.write(c.multiply_score(10)) ``` **Subtly broken code** - Just ignore the unhashable input *(Arguably worse than the example above 😄)* ```python import streamlit as st class MySubtlyBrokenClass: def __init__(self, initial_score: int): self.my_score = initial_score @st.cache_data def multiply_score(_self, x: int) -> int: return _self.my_score * x # This works! c = MySubtlyBrokenClass(5) st.write(c.multiply_score(10)) # But uh oh... this returns the wrong value c2 = MySubtlyBrokenClass(7) st.write(c2.multiply_score(10)) # It prints `50` since Streamlit thinks it's the same input as `c` above ^ ``` ## Workarounds There are two ways to address this with the new cache primitives today: ### Refactor the cached function to have only hashable inputs This is the cleanest option for use cases that support it, and give well factored code that makes it clear what is being hashed. This is definitely the preferred approach! Continuing the example above: ```python import streamlit as st # Now I can see exactly which values affect the output @st.cache_data def multiply_values(score: int, x: int) -> int: return score * x class MyWorkingClass: def __init__(self, initial_score: int): self.my_score = initial_score def multiply_score(_self, x: int) -> int: return multiply_values(_self.my_score, x) # This works! c = MyWorkingClass(5) st.write(c.multiply_score(10)) # This also works! c2 = MyWorkingClass(7) st.write(c2.multiply_score(10)) ``` ### Alternate: Hash some unique property of the unhashable input, and add it as an argument In cases where the above doesn't work, this approach might. But be careful! If the hashed property ends up not being unique (or obscures some relevant mutation in the input), this can lead to subtle bugs similar to the example at the top. **Side Note:** This is exactly equivalent to what `hash_funcs` does! It's just a little more verbose and explicit. **Broken example** ```python import streamlit as st class MyClass: def __init__(self, initial_score: int): self.my_score = initial_score @st.cache_data def multiply_class(cls: MyClass, x: int) -> int: return cls.my_score * x # This doesn't work c = MyClass(5) st.write(multiply_class(c, 10)) ``` **Fix by adding the class hash as an argument -> Just make sure it's really unique!** ```python import streamlit as st class MyClass: def __init__(self, initial_score: int): self.my_score = initial_score @st.cache_data def cached_multiply_class(_cls: MyClass, cls_hash: int, x: int) -> int: return _cls.my_score * x def multiply_class(cls: MyClass, x: int) -> int: return cached_multiply_class(cls, cls.__hash__(), x) # This works c = MyClass(5) st.write(multiply_class(c, 10)) # This also works c2 = MyClass(7) st.write(multiply_class(c2, 10)) ``` ## Proposed Solution In cases where you need (or want) to use the alternate approach, it would be convenient and allow for more compact code if the new cache primitives supported `hash_funcs` again. **Should we add `hash_funcs` back to the new cache primitives?** ### Additional context Related issues and discussions: - https://github.com/streamlit/streamlit/issues/6199 - https://github.com/streamlit/streamlit/issues/6267 - https://discuss.streamlit.io/t/alternative-to-hash-funcs-in-caching/38798 - https://github.com/streamlit/streamlit/issues/6290 --- Community voting on feature requests enables the Streamlit team to understand which features are most important to our users. **If you'd like the Streamlit team to prioritize this feature request, please use the 👍 (thumbs up emoji) reaction in response to the initial post.**

It describes a few ways to solve it without hash funcs, and please feel free to add this use case and upvote.

BramVanroy · March 11, 2023, 9:22am

Thank you @Caroline and @jcarroll. I really hope that hash_funcs will be reborn.

jcarroll · June 8, 2023, 4:27pm

We just merged a fix to add hash_funcs to st.cache_data and st.cache_resource. It should be in the next release. hash_funcs for st.cache_data and st.cache_resource by kajarenc · Pull Request #6502 · streamlit/streamlit · GitHub

system · June 10, 2023, 4:27pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

jcarroll · June 27, 2023, 3:57pm

1.24.0 has been released with the fix, let us know how it is!

Topic		Replies	Views
Caching with hash_funcs fails for similar methods Using Streamlit cache	6	2033	January 12, 2022
Replace st.cache with st.cache_data using hash_funcs Using Streamlit	5	1553	February 16, 2024
Help us stress test Streamlit’s latest caching update Official Announcements cache	23	9417	February 7, 2022
St.cache with sqlalchemy Using Streamlit cache	4	5331	November 19, 2021
Hash_funcs don`t support typing objects Using Streamlit	2	418	January 12, 2022

Alternative to hash_funcs in caching

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies