Animation to show only the last n datapoints

Hi!

I am trying to visualize data over long periods of time and close to real-time if possible. Continuing from the example on how to animate plots (https://docs.streamlit.io/advanced_concepts.html#animate-elements), I built this snippet, which obviously fails quickly as it uses excessive amounts of RAM:

import streamlit as st
import numpy as np


progress_bar = st.progress(0)
status_text = st.empty()
data_points_per_record = 10000
chart = st.line_chart(np.random.randn(10, data_points_per_record))

max = 10000000
for i in range(max):
	# Update progress bar.
	progress_bar.progress(100 * i/max)

	new_rows = np.random.randn(10, 10000)

	# Update status text.
	status_text.text(
		'The latest random number is: %s' % new_rows[-1, 1])

	# Append data to the chart.
	chart.add_rows(new_rows)

	# Pretend we're doing some computation that takes time.
	# time.sleep(0.1)

status_text.text('Done!')
st.balloons()

To improve on that, I am happy to show only the last n data points instead, which I assume would only use a constant amount of memory. So I extended my example:

import streamlit as st
import numpy as np


progress_bar = st.progress(0)
status_text = st.empty()
data_points_per_record = 10000
data = np.copy(np.expand_dims(np.random.randn(data_points_per_record), axis=0))
chart = st.line_chart()

max = 10000000
for i in range(max):
	# Update progress bar.
	progress_bar.progress(100 * i/max)

	new_row = np.random.randn(data_points_per_record)

	# Collect data and only keep the last 5 records.
	data = np.append(data, np.expand_dims(new_row, axis=0), axis=0)
	data = data[-5:, :]

	# Update status text.
	status_text.text(
		'The latest random number is: %s' % new_row[-1])

	# Replot chart.
	chart.line_chart(data)

	# Pretend we're doing some computation that takes time.
	# time.sleep(0.1)

status_text.text('Done!')
st.balloons()

Weirdly, this still exceeds the amount of RAM and does not converge to using a constant amount of RAM. How can I improve my example? Also sometimes it runs into issues with the stream being closed. Is that due to the high throughput? You can easily reduce the through-put, the problem persists. I just did this to reduce the waiting time for the problem to happen.

Thanks for your help!

Hey @benelot, thanks for getting in touch! From your description (I haven’t run the code yet), this sounds like a bug. I’m taking a look today, and will let you know what I find!

Actually: not a bug, but some unintended consequences of a caching layer inside Streamlit.

When a Streamlit app is running, the Streamlit server and client both cache every data message sent from server -> client, so that the server doesn’t need to re-send that same data on subsequent runs. This cache is pruned between runs of the app, but in your example, the cache is growing out of control during a single run, before any pruning can happen.

This behavior makes sense for many apps, but is broken for your use case: a long-running app that generates tons of data, most of which will not be re-used in subsequent runs.

To work around this, you can use a hidden config option. In your ~/.streamlit/config.toml file, add:

[global]
minCachedMessageSize = inf

This sets the minimum size of a cached message to infinity, which effectively disables the cache.

(The other issue with your above example is that the browser’s rendering code falls over while trying to keep up with graph updates, because the graph itself takes so long to re-render. This will eventually cause the stream to timeout and close, unless you’re on a much beefier machine than I am. You may want to reduce the amount of data sent per iteration and/or sleep your app script in between iterations.)

1 Like

Hi! Thanks for your suggestion, it works! I only rarely experience a dead tab anymore. Is there a way to query the length or to clear the queue that uploads the data? I think I could control my upload rate by it.

Hey @benelot - no way to do that cleanly, but if you’re willing to get your hands dirty, you can look at the server’s ForwardMsgCache instance, which is where this caching happens:

Query the length: len(Server.get_current()._message_cache._entries)

Clear: Server.get_current()._message_cache.clear()

Hi again! I tried your suggestion of the message_cache but it stays always empty as far as I could tell by printing the len to console. Even under heavy load where my outputs were definitely not visualized in real-time the server-side cache stayed empty. Does that mean the client-side cache gets overloaded? Is there any way I can find out what is happening on the client-side such that I can mitigate the client-side cache overload?

This is the expected behavior if

[global]
minCachedMessageSize = inf

is set - that is, this config option effectively disables the Server.get_current()._message_cache, so its length should always be zero. (The client side cache should also be empty; with this config option, the server will no longer be asking the client to cache anything.)

I wonder if I misunderstood your initial question though - the RAM consumption you were referring to, is that by the Streamlit Python process, or is it happening in the browser? (I was assuming you were referring to the Python process, but if it’s happening in the browser, it’s possible there’s another memory leak - either in Streamlit, or in vega-lite (which is rendering the line chart).

Good question, because that is the case. The memory growth is happening on the browser side, not the python streamlit process. Your suggestion surprisingly had a positive effect, such that the browser tab did not crash anymore. But the memory consumption by the browser tab still goes to 99% memory, but then somehow stays there until at some later point it finally crashes (but that is not a problem for now in my case as it happens after about 1 hour of continuous running). But of course I would love to find out with you.

Sounds like we have a browser-side memory leak - someone else is also reporting this here: https://github.com/streamlit/streamlit/issues/1148.

I’ll take a look!