We have a Streamlit app (v1.34) in my organisation. It scrapes text and images from websites and does question answering on top of them using OpenAI. The scraping and API calls are set up asynchronously. The scraping is set up using aiohttp and the OpenAI calls are handled using the OpenAI API. For getting the results, the coroutine tasks are created in a list and then ‘await asyncio.gather(*tasks)’ is used to get the results. The app has a global semaphore of 50, which is used across all aiohttp requests and OpenAI API calls. The aiohttp session is created once at the start and closed at the end.
Currently, we develop on Windows and deploy to AWS EC2 in a Linux docker container. We are using a Python 3.12.3 container. The app is facing a memory leak issue on the AWS Linux container (not so on windows). The memory builds up over time and is never released, even after a run is finished. Refreshing the link causes the memory usage to increase as well. Clearing the global cache using st.cache_data.clear() has no effect on the memory. As evident, the memory build up leads to the container crashing eventually.
We have looked for causes of memory leak (using the steps in this link- 3 steps to fix app memory leaks) and the main trace persisting across runs is Streamlit/elements/markdown.py. Not really sure what to make of it.
Another hypothesis we have is that the reference to the async tasks is not being removed from memory after runs. However, we aren’t sure how to resolve this.
It will be great if anyone has faced this issue before and can help us solve it.
Please let me know in the comments if any other details are required. Since the codebase is classified, I can’t post exact code snippets but I will do my best to respond. Thanks you so much!