CSV over 2 GB

Saveliy_Borkov · August 18, 2020, 3:13pm

Hello everyone!

I need to upload a csv over 2 GB to my streamlit app. Then filter it and download the result. Someone worked with such volumes? What commands i need to add except maxsize? How streamlit behaves?

Thanks!

Chad_Mitchell · August 18, 2020, 6:31pm

Hi @Saveliy_Borkov,

You might be able to solve your problem using Pandas, pd.read_csv(…): https://stackoverflow.com/questions/25962114/how-do-i-read-a-large-csv-file-with-pandas

You might also want to check out Dask. I haven’t tried Dask with Streamlit yet, but have a side project (on my ever growing list of Streamlit apps I want to build) to do so.https://pythondata.com/dask-large-csv-python/

Saveliy_Borkov · August 18, 2020, 10:06pm

Thank you for your response! @Chad_Mitchell

My question was more about streamlit (how it can cope with big data), despite this, big thanks for dask advice i ll try it tomorrow.

Saveliy_Borkov · August 19, 2020, 10:31am

Hi @Chad_Mitchell!

Dask wants to get a path to a file, but streamlit can’t provide it .

Streamlit returns StringIO type, maybe there is a solution with it?(without read_csv)

Chad_Mitchell · August 19, 2020, 11:52am

Hi @Saveliy_Borkov,

Can you share your code? Below is a potential workaround.

import os
filenames = os.listdir(folder_path)
selected_filename = st.selectbox('Select a file', filenames)

Saveliy_Borkov · August 19, 2020, 11:56am

@Chad_Mitchell I need to select random file, so my folder_path isn’t always the same.

In my code there is nothing special, just st.file_uploader and read_csv, then some selectbox with filters.

randyzwitch · August 19, 2020, 1:26pm

I think it could be instructive to take a step back…what is the desired user flow here?

Uploading large files through the browser is always going to have some inefficiency. Is the intent to allow a user other than the developer to upload large files to have them processed?

Saveliy_Borkov · August 19, 2020, 1:31pm

@randyzwitch Yes, i’d like to create a tool where people can upload big data file, filter it and get brief overview (like graphs,sums, etc.)
User flow ~ 10 people per month

randyzwitch · August 19, 2020, 2:15pm

Ok, in that case, you just need to decide what is “big” for your use case. If you’re going to let people upload 100GB CSV files, then you need to have a machine that can hold that much data in RAM.

When using file_uploader, we save the bytes of the file to RAM. You’ll need to change the configuration file to allow larger than 200MB, but I think that 2GB limit comes from our use of Protobuf to transfer messages. You can read the background about the 2GB limit on Protobuf in this StackOverflow post

In general, if you are thinking about “big data” applications, I would suggest changing your interface to one where users provide a public URL to the file. This way, you will not transfer the data through the browser, but rather, you’d use a library such as Requests to download the file straight to the Python backend, rather than hitting any limitations via the browser.

Topic		Replies	Views
File uploading Using Streamlit	3	726	August 19, 2020
Streamlit data limit Using Streamlit	8	7777	May 13, 2022
Problem on resources limit Community Cloud	9	849	May 6, 2021
Uploading Files over 50MB Using Streamlit file-upload , configuration	3	7322	November 19, 2021
How to load big files ( around 1gb) with Streamlit Sharing? Community Cloud	5	3601	May 13, 2022

CSV over 2 GB

Related topics