Hi @Jonathan_Rhone,
Thanks for getting back to me. Great that it’s a bug. I do have a use case which requires max_entries=1
with a custom hash string setting str
and int
to None
. Should I report the bug on GitHub or have you already created a ticket to track it? (EDIT: I see you added it while I was commenting)
Before that though, the hashing of the function is not done on the text of the function but rather uses the hash_func
on each element within the function? So a string inside the function is converted using the hash_func
to None
? (EDIT2: I ask, since the two of my functions are very different but they seemed to share the cache)
I’m sorry I couldn’t try out the check_hash
function you provided. Where did you install hashlib
from? The pip install gives me an error. Which library does _CodeHasher
belong to?
The code is much too large to showcase all of it here, but this is the pseudo code for it.
# Sample is custom class which stores information for
# two related elements DNA and Protein
from sample import Sample
MY_HASH = {
Sample: function_to_identify_unique_samples,
str: lambda _: None,
int: lambda _: None,
list: lambda _: None
}
# The two functions are
@st.cache(max_entries=1, hash_funcs=MY_HASH, allow_output_mutation=True)
def preprocess_dna(sample, bool_arg, *other_str_int_list_args):
# A time consuming function, hence caching is used.
# Only to be called for a new sample or when the bool changes from False to True (This comes from a button in the interface).
# It should only store the state of one sample at a time (Memory intensive otherwise), Hence max_entries=1
# It should not be called when other arguments change. Hence the custom hash
# The returned Dna object is mutated afterwards.
# Since it returns an object which is used ahead in the pipeline, I have to call this function in every run.
# Hence, it cannot be called only when the button is pressed i.e. if st.Button(): preprocess_dna(); does not work.
return Dna
@st.cache(max_entries=1, hash_funcs=MY_HASH, allow_output_mutation=True)
def preprocess_protein(sample, bool_arg, *other_str_int_list_args):
# Same as preprocess_dna but for protein
return Protein
Now it turned out that even though there were differences between the functions, it did not store the cache properly as illustrated in the example I shared earlier.
Not sure how informative you found this. Let me know if you want to see the entire app though, I’ll try to make it available.
For now, I solved the problem by using session states and storing the Dna
and Protein
objects in the session state and manually checking within the function for changes and if there are no changes, I return the object stored in the session state - though it’s not the cleanest solution.
Thanks,
Saurabh Parikh