Read CSV files only one time

Hi team,

I’ve an streamlit app to show results and filter by some multiselect or sliders items.

I read a lot csv files to show my results, but when I change any slider or multiselect streamlit reload all files again, but I only need to apply filters in my dataframe.

It is posible to read files only one time by sesion? And not charge the server.

Thanks a lot.

I need details or minimal sample code to solve your issue.

Meanwhile this streamlit caching topic is interesting.

Saving objects to session state is also possible.

1 Like

Thanks @ferdy for your answer. I’ve over 80 files and 1Gb of data to read in my ST app. And have a simple csv reader code.

for filename in os.listdir('data/'):
      df = pd.read_csv('data/' + filename)
      if resultdf.empty:
          resultdf = df.copy(deep=True)
          df = df.iloc[0:0]
        resultdf = pd.concat([resultdf, df])
        df = df.iloc[0:0]

This code load csv data to dataframe, I only need one time and then filter data by streamlit date_input, sliders and multiselect settings.

resultdf['Date'] = pd.to_datetime(resultdf['Date'],format='%Y-%m-%d')
resultdf = resultdf[resultdf['Date'] >= start]
resultdf = resultdf[resultdf['Date'] <= end]
resultdf = resultdf[resultdf['Market'].isin(selected_markets)]
resultdf = resultdf[resultdf['Timeframe'].isin(selected_timeframes)]

Now when I change any parameter Streamlit reload unnecesary all csv files and filter all results. To optimize my app it should load CSV files only one time by session.

This is how you will cache data as if you only call it once.

def get_df_from_csv(fn):
    return pd.read_csv(fn)

Use file1.csv.

csv_path = './data/file1.csv'  # data is a subfolder on your app folder
df1 = get_df_from_csv(csv_path)

Use file2.csv

df2 = get_df_from_csv('./data/file2.csv')

Test it if there is performance improvement compared to without using cache. Read the caching link to explore its capabilities.

1 Like