How to load big files ( around 1gb) with Streamlit Sharing?

Hi,

for my project i need to load a 1gb csv file into a pandas dataframe to create charts.
The file can be stored at a aws-cloud storage or anywhere else. Is it possible to load this data in streamlit (streamlit sharing) to publish the sharts?

Thanks :slight_smile:

Hi @butterstulle -

There are a few ways to handle this. You could compress the file, then use Git LFS, which Streamlit sharing now supports. Or, you could make the file publicly accessible on AWS as you mention (or any other cloud provider), then use something like the requests library to bring the data into your Python session.

If you choose to go the download route, be sure to use st.cache so that the file is only downloaded once. Otherwise, your users are going to have a bad experience :slight_smile:

Best,
Randy

Thank you Randy. By the way: Streamlit and especially Streamlit Sharing are awesome!!!

1 Like

Hi,
I am running streamlit on my local server. I have a 1 GB JSON file which I am accessing directly in pandas like this
data=pd.read_json(‘filename.json’,lines=True,orient=‘columns’,nrows=nrows)
Will it run if I deploy it just as it is or will I have to use requests? I have also stored the file on Google Drive.
I am also using st.cache

It’s the same answer to be honest. Use GitLFS to put the file on GitHub and not need to use requests, or access it from Google Drive via requests.

Alternatively, you can probably get significant space savings by converting the file to parquet or compressed CSV (since you’re going to make a pandas dataframe anyway).

Best,
Randy