Memory usage when uploading new files

Hi,

I’m using:

  • Python: 3.11
  • Numpy: 2.0.1
  • Pandas: 2.2.2
  • Streamlit: 1.36.0
  • Docker: 24.0.6
  • macOS: Sonoma 14.5 (chip Apple M1 Pro)

I’m running this locally with docker. This is the docker-compose.yaml:

version: '3.8'
services:
  webapp:
    build:
      context: .
      dockerfile: docker/webapp.dockerfile
    command: streamlit run --server.port=80 --server.address=0.0.0.0 --server.maxUploadSize=1000 app.py
    container_name: webapp_test
    ports:
    - "80:80"
    volumes:
    - .:/opt/project

And the dockerfile:

FROM python:3.11-slim
RUN apt-get update && apt-get install -y build-essential python3-dev libhdf5-dev pkg-config && pip install --upgrade pip && apt-get clean

RUN mkdir /opt/project
WORKDIR /opt/project
COPY . /opt/project
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH "$PYTHONPATH:/opt/project"

I have created an app that reads a dataframe (with st.file_uploader) from a parquet file and performs some operations and plots with plotly. I realized that if the user uploads a new file, after three or four times the app crashes with error webapp_test exited with code 137. Apparently it runs out of memory (I can see the memory for the docker container in docker desktop and yes, the memory is higher every time). It is like the app is storing something in memory and the memory usage keeps growing.

I made a simple app that does nothing, but showing the head of a dafaframe, to reproduce the problem (name this file app.py):

import streamlit as st
import pandas as pd


file = st.file_uploader("Upload a file")
submitted = st.button("Submit!")
    
if submitted:

    df = pd.read_parquet(file)

    st.dataframe(df.head())

For reproducing the same error you can create a parquet file with 100M rows and two columns:

  • int_col: random integers from 0 to 100 with np.random.randint.
  • float_col: random floats from 0 to 1 with np.random.random.

If you upload the file several times the memory usage keeps growing until the app crashes. The app does nothing but showing the head of the dataframe. I’m not using @st.cache_data nor st.session_state.

In the docs it says that the file is updated with a new one if the user uploads another file. The variable df is overwritten every time. I cannot see root cause of this problem.

So I think that maybe streamlit is storing in memory something else. Is that possible? Can I do something to clear the memory? I have tried the garbage collector and also del file.

Thanks in advance for your help.

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.