New Component: streamlit-webrtc, a new way to deal with real-time media streams

I haven’t seen that error message and I can’t provide the solution.

I think there is no code that imports pyee package, which seems to be required in the error message, so I’m confused.
Are there any other lines of code that cause that error, or are you running your code in some special environment?
Providing such information (the full code and/or detailed info about the runtime environment) may help.

Thanks again for sharing.

That’s it - perhaps I will try this in a new environment. However, instead of video streaming, I solved the issue by using the image capture option as an alternative.

Thanks thanks, you’ve been really helpful.

1 Like

Hey Hafsah_MR were you able to solve this issue?

I’m building a web application for acoustic measurements. I would like the user to be able to play a sinusoidal sweep and record it simultaneously. The playback and the recording must be in sync.

I’m using streamlit, streamlit-webrtc and pydub, as sounddevice is sadly unusable in an online settings as far as I understand.

Unfortunately I’m still a beginner and I’m having trouble understanding this code that I found in the streamlit-webrtc examples, especially the app_sendonly_audio function:

# import asyncio
import logging
import queue
import threading

# import urllib.request
from pathlib import Path
from typing import List, NamedTuple, Optional

# import av
# import cv2
import matplotlib.pyplot as plt
import numpy as np
import pydub
import streamlit as st

# from aiortc.contrib.media import MediaPlayer

from numba import jit
from scipy import signal
from scipy.io import wavfile

# from maad import sound
# from maad import util

from streamlit_webrtc import (
    RTCConfiguration,
    WebRtcMode,
    # WebRtcStreamerContext,
    webrtc_streamer,
)

HERE = Path(__file__).parent

logger = logging.getLogger(__name__)

RTC_CONFIGURATION = RTCConfiguration(
    {"iceServers": [{"urls": ["stun:stun.l.google.com:19302"]}]}
)


def main():
    st.header("WebRTC demo")

    pages = {
        "WebRTC is sendonly and audio frames are visualized with matplotlib (sendonly)": app_sendonly_audio,
        # noqa: E501
        "Plot audio representation with scikit-maad": app_room_measurements,
    }
    page_titles = pages.keys()

    page_title = st.sidebar.selectbox(
        "Choose the app mode",
        page_titles,
    )
    st.subheader(page_title)

    page_func = pages[page_title]
    page_func()

    logger.debug("=== Alive threads ===")
    for thread in threading.enumerate():
        if thread.is_alive():
            logger.debug(f"  {thread.name} ({thread.ident})")


def app_sendonly_audio():
    """A sample to use WebRTC in sendonly mode to transfer audio frames
    from the browser to the server and visualize them with matplotlib
    and `st.pyplot`."""
    webrtc_ctx = webrtc_streamer(
        key="sendonly-audio",
        mode=WebRtcMode.SENDONLY,
        audio_receiver_size=256,
        rtc_configuration=RTC_CONFIGURATION,
        media_stream_constraints={"audio": True},
    )

    fig_place = st.empty()

    fig, [ax_time, ax_freq] = plt.subplots(
        2, 1, gridspec_kw={"top": 1.5, "bottom": 0.2}
    )

    sound_window_len = 5000  # 5s
    sound_window_buffer = None
    while True:
        if webrtc_ctx.audio_receiver:
            try:
                audio_frames = webrtc_ctx.audio_receiver.get_frames(timeout=1)
            except queue.Empty:
                logger.warning("Queue is empty. Abort.")
                break

            sound_chunk = pydub.AudioSegment.empty()
            for audio_frame in audio_frames:
                sound = pydub.AudioSegment(
                    data=audio_frame.to_ndarray().tobytes(),
                    sample_width=audio_frame.format.bytes,
                    frame_rate=audio_frame.sample_rate,
                    channels=len(audio_frame.layout.channels),
                )
                sound_chunk += sound

            if len(sound_chunk) > 0:
                if sound_window_buffer is None:
                    sound_window_buffer = pydub.AudioSegment.silent(
                        duration=sound_window_len
                    )

                sound_window_buffer += sound_chunk
                if len(sound_window_buffer) > sound_window_len:
                    sound_window_buffer = sound_window_buffer[-sound_window_len:]

            if sound_window_buffer:
                # Ref: https://own-search-and-study.xyz/2017/10/27/python%E3%82%92%E4%BD%BF%E3%81%A3%E3%81%A6%E9%9F%B3%E5%A3%B0%E3%83%87%E3%83%BC%E3%82%BF%E3%81%8B%E3%82%89%E3%82%B9%E3%83%9A%E3%82%AF%E3%83%88%E3%83%AD%E3%82%B0%E3%83%A9%E3%83%A0%E3%82%92%E4%BD%9C/  # noqa
                sound_window_buffer = sound_window_buffer.set_channels(
                    1
                )  # Stereo to mono
                sample = np.array(sound_window_buffer.get_array_of_samples())

                ax_time.cla()
                times = (np.arange(-len(sample), 0)) / sound_window_buffer.frame_rate
                ax_time.plot(times, sample)
                ax_time.set_xlabel("Time")
                ax_time.set_ylabel("Magnitude")

                spec = np.fft.fft(sample)
                freq = np.fft.fftfreq(sample.shape[0], 1.0 / sound_chunk.frame_rate)
                freq = freq[: int(freq.shape[0] / 2)]
                spec = spec[: int(spec.shape[0] / 2)]
                spec[0] = spec[0] / 2

                ax_freq.cla()
                ax_freq.plot(freq, np.abs(spec))
                ax_freq.set_xlabel("Frequency")
                ax_freq.set_yscale("log")
                ax_freq.set_ylabel("Magnitude")

                fig_place.pyplot(fig)
        else:
            logger.warning("AudioReciver is not set. Abort.")
            break

This function records audio from the user selected input device and plots its waveform with matplotlib. I would like to be able to play a sinusoidal sweep and simultaneously record it.

This is my code for the generation of the sweep:

def app_room_measurements():
    audio_files_path = r"data/audio_files"
    sweep_string = ""
    inv_filter_string = ""
    ir_string = ""

    @jit(nopython=True)
    def fade(data, gain_start, gain_end):
        """
        Create a fade on an input object

        Parameters
        ----------
        :param data: The input array
        :param gain_start: The fade starting point
        :param gain_end: The fade ending point

        Returns
        -------
        data : object
            An input array with the fade applied
        """
        gain = gain_start
        delta = (gain_end - gain_start) / (len(data) - 1)
        for i in range(len(data)):
            data[i] = data[i] * gain
            gain = gain + delta

        return data

    @jit(nopython=True)
    def generate_exponential_sweep(
        sweep_duration, sr, starting_frequency, ending_frequency
    ):
        """
        Generate an exponential sweep using Farina's log sweep theory

        Parameters
        ----------
        :param sweep_duration: The duration of the excitement signal (in seconds)
        :param sr: The sampling frequency
        :param starting_frequency: The starting frequency of the excitement signal
        :param ending_frequency: The ending frequency of the excitement signal

        Returns
        -------
        exponential_sweep : array
            An array with the fade() function applied
        """
        time_in_samples = sweep_duration * sr
        exponential_sweep = np.zeros(time_in_samples, dtype=np.double)
        for n in range(time_in_samples):
            t = n / sr
            exponential_sweep[n] = np.sin(
                (2.0 * np.pi * starting_frequency * sweep_duration)
                / np.log(ending_frequency / starting_frequency)
                * (
                    np.exp(
                        (t / sweep_duration)
                        * np.log(ending_frequency / starting_frequency)
                    )
                    - 1.0
                )
            )

        number_of_samples = 50
        exponential_sweep[-number_of_samples:] = fade(
            exponential_sweep[-number_of_samples:], 1, 0
        )

        return exponential_sweep

    @jit(nopython=True)
    def generate_inverse_filter(
        sweep_duration, sr, exponential_sweep, starting_frequency, ending_frequency
    ):
        """
        Generate an inverse filter using Farina's log sweep theory

        Parameters
        ----------
        :param sweep_duration: The duration of the excitement signal (in seconds)
        :param sr: The sampling frequency
        :param exponential_sweep: The resulting array of the generate_exponential_sweep() function
        :param starting_frequency: The starting frequency of the excitement signal
        :param ending_frequency: The ending frequency of the excitement signal

        Returns
        -------
        inverse_filter : array
             The array resulting from applying an amplitude envelope to the exponential_sweep array
        """
        time_in_samples = sweep_duration * sr
        amplitude_envelope = np.zeros(time_in_samples, dtype=np.double)
        inverse_filter = np.zeros(time_in_samples, dtype=np.double)
        for n in range(time_in_samples):
            amplitude_envelope[n] = pow(
                10,
                (
                    (-6 * np.log2(ending_frequency / starting_frequency))
                    * (n / time_in_samples)
                )
                * 0.05,
            )
            inverse_filter[n] = exponential_sweep[-n] * amplitude_envelope[n]

        return inverse_filter

    def deconvolve(ir_sweep, ir_inverse):
        """
        A deconvolution of the exponential sweep and the relative inverse filter

        Parameters
        ----------
        :param ir_sweep: The resulting array of the generate_exponential_sweep() function
        :param ir_inverse: The resulting array of the generate_inverse_filter() function

        Returns
        -------
        normalized_ir : array
             An N-dimensional array containing a subset of the discrete linear deconvolution of ir_sweep with ir_inverse
        """
        impulse_response = signal.fftconvolve(
            ir_sweep, ir_inverse, mode="full"
        )  # Convolve two N-dimensional arrays using FFT

        normalized_ir = impulse_response * (1.0 / np.max(abs(impulse_response)))

        return normalized_ir

    sample_rate_option = st.selectbox("Select the desired sample rate", (44100, 48000))
    sweep_duration_option = st.selectbox("Select the duration of the sweep", (3, 7, 14))
    max_reverb_option = st.selectbox(
        "Select the expected maximum reverb decay time", (1, 2, 3, 5, 10)
    )

    st.caption(
        """
                Note that longer sweeps provide more accuracy,
                but even short sweeps can be used to measure long decays
                """
    )

    def write_wav_file(file_name, rate, data):
        save_file_path = os.path.join(audio_files_path, file_name)
        wavfile.write(save_file_path, rate, data)
        st.success(f"File successfully written to audio_files_path as:>> {file_name}")

    def play_sweep(wavefile_name):
        read_file_path = os.path.join(audio_files_path, wavefile_name)
        # Extract data and sampling rate from file
        sample_rate, data = wavfile.read(read_file_path)

        stop_button = st.button("Stop")

        if "stop_button_state" not in st.session_state:
            st.session_state.stop_button_state = False

        sd.play(data, sample_rate)

        if stop_button or st.session_state.stop_button_state:
            st.session_state.stop_button_state = True

            sd.stop()

        else:
            sd.wait()  # Wait until file is done playing

    user_input = str(st.text_input("Name your file: "))

    if user_input:
        sweep_string = user_input + "_exponential_sweep_.wav"
        inv_filter_string = user_input + "_inverse_filter_.wav"
        ir_string = user_input + "_impulse_response_.wav"

        st.write(sweep_string)

        play_button = st.button("Play")

        if "play_button_state" not in st.session_state:
            st.session_state.play_button_state = False

        if play_button or st.session_state.play_button_state:
            st.session_state.play_button_state = True

            sweep = generate_exponential_sweep(
                sweep_duration_option, sample_rate_option, 20, 24000
            )
            inv_filter = generate_inverse_filter(
                sweep_duration_option, sample_rate_option, sweep, 20, 24000
            )

            write_wav_file(file_name=sweep_string, rate=sample_rate_option, data=sweep)
            write_wav_file(
                file_name=inv_filter_string, rate=sample_rate_option, data=inv_filter
            )

            play_sweep(sweep_string)

The recorded sweep is deconvolved with the inversed sweep, finding the impulse response of the room.

Is it possible to record simultaneously or I’m better off abandoning the idea of doing a website for this and instead focus on buildin an offline app?

I’m not an audio professional (all the audio processing in the samples are just copy-and-pasted from the web),
so not sure the following answer helps.


  • The audio filter sample may be the reference. It’s a real-time audio filter that amplify the sound from the audio input.
  • out_recorder_factory may suite your purpose of recording. See The recorder sample for the usage though it is for video data, but I think it’s OK with audio too.

disclaimer: I have not tested them.

Hey @whitptx, is it possible to send and receive only video frames and send audio occasionally?

My scenario is, I want to take an input frame (video_frame_callback) and after some frame processing, I want to (return audio frames to) sound an alarm if a condition is met. For eg. play an alarm sound if a specific object is detected.

Currently, I have a working application that is capable of taking in video and microphone audio input and returning the processed frame and processed audio. To handle the audio part I used your audio_gain example, where I simply return frames of my alarm mp3 when a condition is met during frame processing otherwise I simply return “silence” frames.

What I have trouble with is the user has to give microphone access to collect the audio frames on which I do my processing. This seems wrong as the application in itself has nothing to do with input sound, I just want to return audio on occasion and for that, only the device speaker is required for which permission access is not required.

Is this possible?

@VerbisAnimae Hi, I think it is not possible.
I created an issue suggesting an extension that may solve this problem: Add frame callbacks with SENDONLY mode · Issue #1069 · whitphx/streamlit-webrtc · GitHub
Please wait for it.
Thank you for reporting such an interesting real-world use case!

1 Like

the model works just fine but then I get this error when I move from the camera and its intermittent.

2022-09-22 21:13:16.903 Error occurred in the WebRTC thread:
2022-09-22 21:13:16.904 Traceback (most recent call last):
2022-09-22 21:13:16.904 File “/home/user/.local/lib/python3.8/site-packages/streamlit_webrtc/process.py”, line 108, in _run_worker_thread
2022-09-22 21:13:16.904 self._worker_thread()
2022-09-22 21:13:16.904 File “/home/user/.local/lib/python3.8/site-packages/streamlit_webrtc/process.py”, line 196, in _worker_thread
2022-09-22 21:13:16.904 new_frames = finished.result()
2022-09-22 21:13:16.904 File “/home/user/.local/lib/python3.8/site-packages/streamlit_webrtc/models.py”, line 115, in recv_queued
2022-09-22 21:13:16.904 return [self.recv(frames[-1])]
2022-09-22 21:13:16.904 File “/home/user/.local/lib/python3.8/site-packages/streamlit_webrtc/models.py”, line 107, in recv
2022-09-22 21:13:16.904 return av.VideoFrame.from_ndarray(new_image, format=“bgr24”)
2022-09-22 21:13:16.904 File “av/video/frame.pyx”, line 358, in av.video.frame.VideoFrame.from_ndarray
2022-09-22 21:13:16.904 File “av/utils.pyx”, line 69, in av.utils.check_ndarray
2022-09-22 21:13:16.904 AttributeError: ‘NoneType’ object has no attribute ‘dtype’

Please help me on this…

I did try to change the VideoTransformer to VideoProcessor

But I guess i’m missing something while doing, please help me on the same…

fyi I’m a beginner… excuse me…

Thanks in advance…

Hey @whitphx, I wrote a blogpost: Driver Drowsiness Detection Using Mediapipe In Python where I’ve used this streamlit-webrtc for deployment onto streamlit Cloud.
Thank you for this amazing library.

2 Likes

Dear everyone !

Using streamlit webrtc to connect to a microscope. It works great for 3-5 second and then the delays starts to build up. What is the best way to prevent this? I tried inserting a time.sleep(0.5) to lower the framerate but it doesn’t seem to solve the issue.

Is there a way to change the underlying buffersize / framerate? I can see all the cores are also roasting :smiling_face_with_tear:

@Zulfiquar15
It looks like new_image passed to av.VideoFrame.from_ndarray is None.

@veb101 Hey what a great article it is, and I’m so glad streamlit-webrtc could be a part of it!

@ZKLO I could not guess the cause only with the provided info.
Can you provide the minimum reproducible code example (and any other info if necessary)?

FYI, streamlit-webrtc already has built-in frame-dropping to avoid the throughput performance flaw in the streaming part by default unless you set async_processing=True on webrtc_streamer(). So if you encounter the performance problem even with it, the cause may resides in somewhere like the frame callback function, or outside the code e.g. input/output network (I’m not sure though)

1 Like

Hey @whitphx, I am thinking of creating a video writer file using cv2.

Now my issue is while creating the cv2.VideoWriter, I need to access the fps along with the frame size of the webcam from the client.

So, I am thinking of creating something like this:

out_vid = cv2.VideoWriter(f'file_name.mp4', fourcc, fps, frame_size)

I can then use the video_frame_callback as in:

def video_frame_callback(frame: av.VideoFrame):
    frame = frame.to_ndarray(format="rgb24")  
    frame =  # Process frame
    out_vid.write(frame)
    return av.VideoFrame.from_ndarray(frame, format="rgb24")

My use case would be to create a download video option as soon as the STOP button is clicked.

Can you please give me some pointers on how to achieve that?

@KD_7 For that purpose, you should use aiortc.contrib.media.MediaRecorder instead of cv2.VideoWriter in combination with streamlit-webrtc as below.
See this example, where the video stream processed via the frame callback is recorded into output.flv.

First of all, great work Whitphx!
One quick question: I’m developing a QR code for a library and it works fine, but it would be easier to start the video without a clic on a button, and specially: freeze the video automatically when I get the QR decoded.

For now I have to pause it manually, what can I add or modify?

    def recv(self,frame):
        img = frame.to_ndarray(format="bgr24")
        qr= detector.detectAndDecode(img)
        if qr[0]:
            self.result = qr[0]
            for i in range(4):
                cv2.line(img,tuple(map(int,qr[1][0][i-1])),
                tuple(map(int,qr[1][0][i])), (50, 50, 255), 
                thickness=2, lineType=cv2.LINE_8)
            cv2.putText(img,text=qr[0],org=tuple(map(int,qr[1][0][0])),fontFace=cv2.FONT_HERSHEY_SIMPLEX,fontScale=0.7,color=(255, 50, 50),thickness=2,lineType=cv2.LINE_8)
            st.session_state["found_qr"] = True
            st.session_state["qr_code_image"] = img  
        return av.VideoFrame.from_ndarray(img, format="bgr24")
    pause video, but streamlit running to complete a library form...

snapshot

Hi @whitphx, thanks for the super quick response.

There seems to be something off with .mp4 files as the output file doesn’t seem to render properly after I close the application (i.e., by initiating the STOP button).

However, everything seems to work fine if I revert back to the .flv format.

Any particular reason for this behaviour?

@mabusdogma
Hi,
I can’t reproduce the problem with that incomplete code so I only guess that using st.session_state inside the callback may be the cause because it is not allowed (see GitHub - whitphx/streamlit-webrtc: Real-time video and audio streams over the network, with Streamlit.)

@KD_7
Although I haven’t investigated it well, I encountered the MP4 problem on my local macOS env and found the following error had been emitted on the shell.

Application provided invalid, non monotonically increasing dts to muxer in stream 1: 1536 >= 1536

It looks like Error using MediaRecorder creating HLS segments · Issue #331 · aiortc/aiortc · GitHub refers to it and this error is from ffmpeg and related to the video encoder installed at the environment.
I don’t know the clear solution, but it may be successful to record the videos in MP4 format in a different platform where proper codecs are installed.

hello @whitphx i am using streamlit for rendering my local video but I am having trouble finding how can I change the media source in the middle of stream

like have 2 videos
video 1 and video 2

i want to render their frames and as the 1st video ends i want to render frames from the second one that is i want to change the source in the middle