Memory behavior

I was making some memory profiling on my application that was using caches, until I realise it wasn’t releasing it.

So I digged a bit to make a simple project and make some tests.

import streamlit as st
import time
import pandas as pd
import numpy as np
from streamlit import caching
import gc

@st.cache(allow_output_mutation=True, ttl=30)
def expensive_computation(a, b):
    time.sleep(5)
    df = pd.DataFrame(np.random.randn(a, b))
    return df

def main():
    if st.button("Clear Cache"):
        caching.clear_cache()
    a = 10000
    b = 20
    res = expensive_computation(a, b)
    st.dataframe(res)
    del res


if __name__ == "__main__":
    gc.enable()
    main()
    gc.collect()

At first I was only using the cache decorator with the ttl option. But when a did a memory profiling, the memory didn’t went down after 30 seconds. Each time a was hitting Refresh, the memory was increasing.

So I tried to use caching.clear_cache(), but memory wasn’t decreasing either.
Same with the garbage collector.

And even removing the cache, each time I hit refresh, the memory kept increasing.

Is this a normal behavior? Because I was expeting that if the user refresh the page, the session was killed, and at least variable were removed by python or streamit. But it seems to keep them in memory. I was hoping it was only momory allocation, that could be reused by the program, but it seems not.

Since I’m using docker to deploy my apps with very limited memory, it often crashes, even with a simple app like above, just by spamming F5.

Here are some profiling I ran, first one is no caching at all, just raw function and display.


As you can se, each time I hit refresh (F5) memory goes up.

Here I have set used the code above, so caching, del my data, and garbage collector. I even pressed the button to clear cache at aprox 80 seconds.

Is there something i’m doing wrong or missing?

2 Likes

Hi @Uranium, welcome to the Streamlit community!

I don’t think you’re doing anything wrong per se, but I think this is a difference between indicating to re-computer (i.e. clearing cache) vs. freeing memory (garbage collection). In the case of Python, it is managing the memory as it sees fit, dumping objects in RAM when it deems they are no longer in use. If you want to force that behavior, doing something like gc.collect after setting the cache to clear would be more appropriate.

Note that in earlier versions of Streamlit, this garbage collection call was not made automatically, but has since been built into Streamlit. So if you aren’t already, I would upgrade to the newest version of Streamlit and see if that fixes your issue.

Best,
Randy

2 Likes

Hello, thank you for the reply.

I’m checking the version, but it seems okay to me. Streamlit is running at version 0.84.0 and python is at 3.6.8.

So I’ve upgraded to 0.87.0 and it seems that some of the memory is freed at each Refresh for the user. But memory is still growing up slowly. I’ve run the code with out any cache or garbage collector nor del. Like this:

import streamlit as st
import time
import pandas as pd
import numpy as np

def expensive_computation(a, b):
    time.sleep(5)
    df = pd.DataFrame(np.random.randn(a, b))
    return df

def main():
    a = 10000
    b = 20
    res = expensive_computation(a, b)
    st.dataframe(res)


if __name__ == "__main__":
    main()

Here is the result of the memory profiling:

Regards,

1 Like

In this code snippet, you are removing the @st.cache decorator, so you are losing the performance improvement from not recalculating AND you’re potentially generating new objects in the Python background, because it makes all sorts of copies of objects for whatever reason. This is a shortcoming in Python/dynamic languages where you don’t explicitly allocate memory.

So when you’re talking about memory in the 100s of MBs range, I’m not surprised that Python doesn’t fill up that little of RAM just deciding to run a program.

What use case are you solving for with such few resources?

Best,
Randy

1 Like

Hello,

I’m using streamlit for a simple dashboard with small computation where user upload data. I had a doubt about the cache system since the memory was going up. Basically we where cashing the data uploaded with a time limit, and number limit. To try to speed up some work. And it does :slight_smile:

Our container is small because of, money… So when the container reach full RAM, it restarts. Which would stop any work other users are doing on the app.

So I mixed a bit of 2 subjects here. Caching and what I thought of memory issues. I’m not questionning the cache performance. I’ve looked up the documentation on how it works and I have no issue at all about it.

I’m more curious about the increasing memory at each refresh. Could it be some type of logs that are stored in memory? Some object i’m not aware that stays alive across all sessions?

I’m trying to understand what could raise the memory. But it’s more about curiosity now than a real issue. The 0.87.0 version did help me a lot.

Regards,

2 Likes

Hello @Uranium,

Did you ever find a workaround for this?

1 Like

@chchchchch @Uranium Did you find any solutions around this?

1 Like

We have a memory leak fix that will land in the next release. That may help. :slight_smile:

2 Likes

@mathcatsand thank you. Is this fix currently available in the streamlit-nightly==1.31.1.dev20240206 ?

1 Like

Yes. It merged five days ago, so it is included in yesterday’s nightly. :slight_smile:

2 Likes

When is the next release?

1 Like

Although it can vary, Streamlit is released approximately every four weeks. The last release was February 1. You can check out our changelog to see what the pacing of the releases have been.

2 Likes

Version 1.32.0 is now out, including the memory leak fix.

2 Likes