Fast rendering of plots from pyaudio streaming

I would like to use streamlit for a live sound analysis app. I would like to use pyaudio to capture audio, process it, then render plots/graphs using streamlit. The problem is there is a lot of latency. If I use the code and simply write the *.png file, I can write a simple HTML poller that seems to perform pretty well in the browser. How can I speed this up so that it plots faster?

import streamlit as st
import numpy as np
import pyaudio
import matplotlib.pyplot as plt
import time

RATE = 44100
NMBR_UPDATES_SECOND = 5
CHUNK = int(RATE / NMBR_UPDATES_SECOND)

def plot_miced_sound(stream):
    t_start=time.time()
    data = np.frombuffer(stream.read(CHUNK, exception_on_overflow=False),dtype=np.int16)
    fig = plt.figure()
    plt.plot(data)
    plt.title("title")
    plt.grid()
    plt.axis([0,len(data),-2**16/2,2**16/2])
    plt.savefig("sound.png",dpi=50)
    print(f"{(time.time()-t_start)*1000} ms")
    image_place_holder.write(fig)
    plt.close('all')



p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16,
                channels=1,
                rate=RATE,
                input=True,
                frames_per_buffer=CHUNK)

image_place_holder = st.empty()
for i in range(int(20*RATE/CHUNK)):
    plot_miced_sound(stream)
stream.stop_stream()
stream.close()
p.terminate()

Hi @theholymath, welcome to the Streamlit community!

In general, matplotlib is slow, as it re-writes the entire scene on each draw. Additionally, there is some latency due to Streamlit it seems, but our engineering team can’t seem to figure out where.

An immediate solution would be to use our add_rows functionality, which is built for this use case:

https://docs.streamlit.io/en/stable/api.html?highlight=add_data#streamlit.delta_generator.DeltaGenerator.add_rows

You could also use one of the web-native Python packages like Altair (vega-lite), Plotly or Bokeh and likely get considerably better performance without the wasteful step of writing to PNG each time.

Best,
Randy

Hi,

I came here as a total Streamlit newb looking to cleanly implement an equivalent scenario ― plotting various time series being continuously generated by my application as it processes video input indefinitely.

So as I understand here, or would wishfully wish to understand, we can use DeltaGenerator.add_rows in a loop, as the most efficient way of updating the display from streaming data, is that the case? I’m asking this since as a newb here I’m not quite sure from the documentation link provided above which succinctly says:

Concatenate a dataframe to the bottom of the current one.

Does that alone really enable updating such that the only most recent N seconds of data are always shown? Would you say this solves my very similar use case very idiomatically with Stremlit’s API?

I’m not writing the data to files in my case, rather it’s available in memory as it is continuously being derived from the raw video stream by my code (my data for visualization are feature values being computed continuously from the video frames).

Thanks a lot,
Matan

Oh also I’m intrigued, were plotly and bokeh mentioned above as alternatives, implying that the use case is currently better supported using them? or does using Streamlit in concert with them make any kind of useful design?

Correct. You can see this pattern from our streamlit hello demo:

https://github.com/streamlit/streamlit/blob/develop/lib/streamlit/hello/demos.py#L188-L211

Doing the most recent N seconds is likely just an indexing operation on your array/dataframe to pick the last Y elements to plot.

Yes. In the case of matplotlib, the scene is re-drawn each time, which is wasteful in terms of performance. For the web-based libraries, we are using the intelligent diffing features of React and the component libraries, so only the changes are re-drawn. Additionally, matplotlib has some multi-threading issues that aren’t present in the web based libraries.

So a real-time use case is much more suited to the code snippet I posted above.

Thanks a lot for the answer and architecture comments, this gives me confidence to go into Streamlit!

This is great, but I do not want to concatenate or append, I want to redraw the new plot as if it is a reel-to-reel tape and I only see the current view. I have an open mic, and I would like to continuously plot the last n milliseconds of processed audio that comes in.

Is there a st.remove_lines or some other way of keeping the same window view as data comes in?

You can write over the dataframe if you choose, it’s really up to you on what the exact requirements are. My point about the append was that it signals to the frontend that everything but the data is expected to be the same, so it doesn’t get re-drawn.

With creative indexing of a dataframe, you can also get the behavior you want…taking the last n_seconds * hz of rows seems like what you’re after?