Caching pandas dataframe

Hello everyone,

I am trying to read in a pandas dataframe using the code down below. However, it is giving me the following error: **UnhashableType** : Cannot hash object of type _io.StringIO

I looked at the documentation but I still cannot figure out why it does not work for me. Any suggestions?

import streamlit as st
import pandas as pd

# Uploader widget
st.sidebar.title("Upload Your File")
filename = st.sidebar.file_uploader("Choose a file", type=['xlsx', 'csv'])
delimiter_choice = st.sidebar.selectbox("In case you uploaded a CSV file, "
                                        "how is your data delimited?", [';', ','])
st.sidebar.markdown("---")


# Function that tries to read file as a csv
# if selected file is not a csv file then it will load as an excel file
@st.cache
def try_read_df(f):
    try:
        return pd.read_csv(f, sep=delimiter_choice)
    except:
        return pd.read_excel(f)


if filename:
    df = try_read_df(filename)

st.write(df)

Hi @bjornvandijkman,

You are probably hitting this issue which comes from this original discussion where you want to cache the results of a Dataframe that is being created from an uploaded file. Streamlit doesn’t know yet how to handle a file stream from its file uploader widget.

Until the issue is being solved natively by Streamlit, you can try to hash part of the uploaded file or use the more consuming solution of reading and hashing the entire file :slight_smile:

2 Likes

Thanks! The last solution in the linked issue works for me.

2 Likes

Hey @bjornvandijkman & @andfanilo,

Here is the pull request that will natively solve this :heart:

2 Likes