Streamlit with datasets up to 1 mil of rows

Trying to understand how i can load my dataset with up to 1mil of rows.
Dataset size is relative small … i have 2 samples for test purpose - 260k of rows and 100k of rows.
100k of rows - loads fine… 260k - wont - half way trough it shows still loading but nothing happens - page a bit unresponsive.
Getting these errors in my terminal:
Traceback (most recent call last):
File " /home/evo/koala/lib/python3.11/site-packages/tornado/websocet.py, line 1089, in wrapper raise WebSocketClosedError()
tornado.websocet.WebSocetClosedError.

Two of my columns are mostly empty (99.9% - empty ) but i need them.
Im really new to python and streamlit and pandas stuff.
4 columns are spit into more columns by panda and in total i have 42 columns including Index column.( all rows needed, no needed rows are in my file and nothing i cant make less columns due to way data is analized ( the way i need ) only more columns will be added … so database will be up to 60 column, but actual csv file is 27 columns.
260k row csv file takes around 45MB only. 100k - <18MB.

Or there is a way to load only few columns but all of them are available when i need them ?
My way of sorting data is once file is uploaded , it splits specific columns into multiple so i would be able to take deeper look into repetitive patterns .

Im using latest Streamlit version and Python 3.11 with venv and Arch Linux ( if this makes sense or is helpfull )
Thank You and have a great day.

You can consider reading the files in chunks using pandas. You could also use @st.cache_data with Streamlit to cache the results of the data loading. Here’s a simple code example to illustrate the point:

import streamlit as st
import pandas as pd

@st.cache_data
def load_data(filename):
    chunk_size = 50_000  # Adjust this value based on your system's capacity
    chunks = []
    for chunk in pd.read_csv(filename, chunksize=chunk_size):
        chunks.append(chunk)
    
    data = pd.concat(chunks, axis=0)
    return data

# Use the function to load your data
data = load_data('path_to_your_large_file.csv')

st.dataframe(data.head())  # Just displaying the first few rows as an example

Let me know if this helps.

1 Like

Thank You !
I did tried chunksize but i wrote differently and i was getting error and as i see - your option is chunk_size instead of my chunksize.
Appreciated !

1 Like

Awesome. Glad it worked. Happy Streamlit-ing! :balloon:

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.