Create multiple dataframes from CSV files loaded via the multi-file uploader?

Hi guys!

Does any know how to create multiple dataframes from the CSV files I’m uploading via the new multi-file uploader?

I’m pasting the code here for reference:

multiple_files = st.file_uploader(
    "Multiple File Uploader",
    accept_multiple_files=True
)
for file in multiple_files:
    file_container = st.beta_expander(
        f"File name: {file.name} ({file.size}b)"
    )
    file_container.write(file.getvalue())

st.write("### Code")

I’ve tried various things yet no luck so far! :slight_smile:

Thanks,
Charly

Hi @Charly_Wargnier,

I think this does what you are looking for.

import streamlit as st
import pandas as pd
import io

multiple_files = st.file_uploader(
    "Multiple File Uploader",
    accept_multiple_files=True
)
for file in multiple_files:
    file_container = st.beta_expander(
        f"File name: {file.name} ({file.size})"
    )
    data = io.BytesIO(file.getbuffer())
    file_container.write(pd.read_csv(data))

st.write("### Code")

Hope it helps !

1 Like

Thanks Ashish, I can see all my dataframes loaded in the app, which is great!

Looks like I was missing this line ->

data = io.BytesIO(file.getbuffer())

Now I’ve got another issue! :frowning: I’ve added a quick video below to show you:

If I amend anything in my app (e.g. tick a box, like in the video), the dataframes disappear and I’ve got the following error:

EmptyDataError: No columns to parse from file

313FF0B0-7CF5-4E6B-B087-990291C0AFAA

I believe I need a way to cache these tables, yet I’ve not managed to cache them via st.cache. I’m cc’ing @Fabio as he may have encountered a similar issue with the new multi-uploader. :slight_smile:

I’ll keep digging! In the meantime, thanks so much for your help! :pray:

Charly

Hopefully I’m getting your use case right but I think you should be able to do just this. The key here is the seek(0). To optimize, we are returning the same buffer on rerun. Unfortunately, this means that if you’ve already read the buffer, you’ll need to reset after. If you use .getValue(), there’s no need to seek. Unfortunately for file uploader pandas.read_csv does a read() instead.

multiple_files = st.file_uploader('CSV',type="csv", accept_multiple_files=True)
for file in multiple_files:
	dataframe = pd.read_csv(file)
	file.seek(0)
	st.write(dataframe)
3 Likes

file.seek(0) seems to be doing the trick. I’ll test more intensely tomorrow.

Thanks Karrie! :slight_smile: