Alternative to hash_funcs in caching

Hello all

I have been updating my apps to recent streamlit versions. There, the old cache has been deprecated in favor of caching resources and data. Unfortunately, this seems to imply that hash_funcs are also deprecated. I find this very unfortunate and I wonder whether an alternative is even possible or whether the deprecation means that I won’t be able to use caching for my function in the future.

import streamlit as st

class Predictor:
    def __init__(self, model_name):
        self.model_name = model_name
        self.model = get_model(model_name)

@st.cache_data(show_spinner=False)
def predict(query: str, predictor: Predictor):
    # ... some code to use predictor to predict with query as input data that returns a prediction
    # While predictor might not be hashable, a hashfunc would be very useful
    # because then we can simply specify {Predictor: lambda predictor: predictor.model_name}
    # because the model_name is the distinguishing factor


@st.cache_resource(show_spinner=False)
def get_model(model_name: str, no_cuda: bool = False):
    # ... some pytorch code to return a model specific to model_name

I do not see how I can use the cache in predict with the current implementation because I cannot use hash_funcs. I canot simply ignore the predictor variable because of course different predictors/models should return different results. I could easily distinguish between them with hash_funcs by using their model_name property.

What would be the current recommended way of using cache in the predict function above? Note that Predictor actually has a lot of input arguments so I would not like to have the Predictor init inside the predict function.

I don’t know if this is the official solution, but it is more a hack that came to my mind:

@st.cache_data(show_spinner=False)
def predict(query: str, _predictor: Predictor, model_name: str):
    # _predictor: disables hashing for this argument
    # model_name: just pass predictor.model_name and therefore make the call unique?

I see what you mean, thanks. That would work but is of course not optimal. (And linters won’t like it - unused argument.) Was hoping there was a better solution but thanks for the suggestion.

If a more optimal/official solution is available from others I am glad to hear it.

Hey @BramVanroy, just wanted to share this thread with you where the team discussed a similar issue, in case it’s useful

Hey @BramVanroy I did a writeup on this here:

It describes a few ways to solve it without hash funcs, and please feel free to add this use case and upvote.

2 Likes

Thank you @Caroline and @jcarroll. I really hope that hash_funcs will be reborn.

We just merged a fix to add hash_funcs to st.cache_data and st.cache_resource. It should be in the next release. hash_funcs for st.cache_data and st.cache_resource by kajarenc · Pull Request #6502 · streamlit/streamlit · GitHub

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

1.24.0 has been released with the fix, let us know how it is!

2 Likes