Hi,
I’m using:
- Python: 3.11
- Numpy: 2.0.1
- Pandas: 2.2.2
- Streamlit: 1.36.0
- Docker: 24.0.6
- macOS: Sonoma 14.5 (chip Apple M1 Pro)
I’m running this locally with docker. This is the docker-compose.yaml
:
version: '3.8'
services:
webapp:
build:
context: .
dockerfile: docker/webapp.dockerfile
command: streamlit run --server.port=80 --server.address=0.0.0.0 --server.maxUploadSize=1000 app.py
container_name: webapp_test
ports:
- "80:80"
volumes:
- .:/opt/project
And the dockerfile
:
FROM python:3.11-slim
RUN apt-get update && apt-get install -y build-essential python3-dev libhdf5-dev pkg-config && pip install --upgrade pip && apt-get clean
RUN mkdir /opt/project
WORKDIR /opt/project
COPY . /opt/project
RUN pip install --no-cache-dir -r requirements.txt
ENV PYTHONPATH "$PYTHONPATH:/opt/project"
I have created an app that reads a dataframe (with st.file_uploader
) from a parquet file and performs some operations and plots with plotly
. I realized that if the user uploads a new file, after three or four times the app crashes with error webapp_test exited with code 137
. Apparently it runs out of memory (I can see the memory for the docker container in docker desktop and yes, the memory is higher every time). It is like the app is storing something in memory and the memory usage keeps growing.
I made a simple app that does nothing, but showing the head of a dafaframe, to reproduce the problem (name this file app.py
):
import streamlit as st
import pandas as pd
file = st.file_uploader("Upload a file")
submitted = st.button("Submit!")
if submitted:
df = pd.read_parquet(file)
st.dataframe(df.head())
For reproducing the same error you can create a parquet file with 100M rows and two columns:
- int_col: random integers from 0 to 100 with
np.random.randint
. - float_col: random floats from 0 to 1 with
np.random.random
.
If you upload the file several times the memory usage keeps growing until the app crashes. The app does nothing but showing the head of the dataframe. I’m not using @st.cache_data
nor st.session_state
.
In the docs it says that the file is updated with a new one if the user uploads another file. The variable df
is overwritten every time. I cannot see root cause of this problem.
So I think that maybe streamlit
is storing in memory something else. Is that possible? Can I do something to clear the memory? I have tried the garbage collector and also del file
.
Thanks in advance for your help.