Database or CSV files

Hi team, I’ve a big doubt, which is faster to use database or csv files?

I’ve an app to show and filter about 80 csv files and 1Gb data. I need to read every file and filter data to show results. This process takes about 5 minutes and need to refilter in every settings change.

How can I make my app more efficient with so much data?

Thanks a lot

Hi @optimvs

Have you looked into caching your data using st.cache_data (st.cache_data - Streamlit Docs)

Also see this post that suggests using an option for persisting to disk (st.cache_data(persist="disk")):

Another area to explore in improving the performance is to determine if the entire data is needed, here are some questions to consider:

  • Are all columns needed?
  • Or only certain columns are used from the data? If so, you can use the usecols parameter of pd.read_csv.

Another great read is the blog by @randyzwitch on building performant apps (6 Tips for Improving Your App Performance | Streamlit), in particular sections 4 and 5 on: * 4. Remove unused data* and * 5. Optimize data storage formats*

Hope this helps!

4 Likes

Make sure you only read (and parse) the files once, not on every settings change.

2 Likes

Depends on what you are doing. Eventually you have to test which one is faster in your case.

Reading the file is usually the bottleneck. You can solve this by caching. If filtering takes too much time too, try caching the results. Read the caching documents. And test your implementation if there is improvement.