Hash function error for uploaded text file

Hi @santosh_boina,

You are hitting a case where Streamlit doesn’t know how to compute the hash for an object of type StringIO which is used inside pd.read_table on your file, so Streamlit knows if it has already computed and put into cache a similar uploaded file passing through StringIO. We need to indicate how to hash it through the hash_funcs argument.

Since it’s a string buffer, I think a first good way would be to download the content of the uploaded file and have Streamlit hash that, so if you upload the same file, it “checks” the content and if the content is the same as your previous uploaded file it doesn’t rerun the computation and fetches the computed pandas DataFrame.
You can download the whole file content through StringIO.getvalue so the following should work :

from io import StringIO

@st.cache(hash_funcs={StringIO: StringIO.getvalue})
def load_data(file_uploaded):
    ...

Unfortunately I think that makes the function read the full file twice, one for cache detection and one for the actual computation, and if the file is big that may be long. A better way would be to only read the beginning of the buffer in the hash_funcs instead of the whole file :slight_smile:.

If you are new to Streamlit and want to learn more about it (especially how it checks for objects it has already run on), then you may benefit from reading the Caching and Advanced caching for other techniques caching techniques.

Best of luck !

2 Likes