New Component: streamlit-webrtc, a new way to deal with real-time media streams

This is awesome! :raised_hands:

1 Like

Awesome work whitphx! :raised_hands:

I would like to ask one thing, is it possible to take photos with this component? I am building an app for mobile where I need the flexibility of selecting the device but I just need to take photos. :slight_smile:

Thanks in advance!

1 Like

Thank you!

For you question, I think the answer is yes.
You can access the frame arrays in the transform() method of your VideoTransformer.
Please check the code below. I hope it meets your needs.

Note that these snapshots are taken on the server-side, not on the frontend,
then if there is a network delay, there will be a delay in when the photos are taken.

import threading
from typing import Union

import av
import numpy as np
import streamlit as st

from streamlit_webrtc import VideoTransformerBase, webrtc_streamer

def main():
    class VideoTransformer(VideoTransformerBase):
        frame_lock: threading.Lock  # `transform()` is running in another thread, then a lock object is used here for thread-safety.
        in_image: Union[np.ndarray, None]
        out_image: Union[np.ndarray, None]

        def __init__(self) -> None:
            self.frame_lock = threading.Lock()
            self.in_image = None
            self.out_image = None

        def transform(self, frame: av.VideoFrame) -> np.ndarray:
            in_image = frame.to_ndarray(format="bgr24")

            out_image = in_image[:, ::-1, :]  # Simple flipping for example.

            with self.frame_lock:
                self.in_image = in_image
                self.out_image = out_image

            return out_image

    ctx = webrtc_streamer(key="snapshot", video_transformer_factory=VideoTransformer)

    if ctx.video_transformer:
        if st.button("Snapshot"):
            with ctx.video_transformer.frame_lock:
                in_image = ctx.video_transformer.in_image
                out_image = ctx.video_transformer.out_image

            if in_image is not None and out_image is not None:
                st.write("Input image:")
                st.image(in_image, channels="BGR")
                st.write("Output image:")
                st.image(out_image, channels="BGR")
                st.warning("No frames available yet.")

if __name__ == "__main__":

Thanks so much for the prompt response whitphx.

Really appreciate the code and the note. Again, awesome work on your end and a great contribution to streamlit community. :clap:


Hello Community. Thank you so much @whitphx for this wonderful contribution.
I’m making a face-recognition app and deploying it on heroku but opencv isnt working on heroku. My app runs fine locally. The app should be able to detect faces in the videostream when an existing user in the database starts webcam.


the lines in the code below that are erroneous have a comment “#this line”. Basically i need to draw rectangles on a face if known user is found in a frame and i need to access one frame in the webcam videostream (to run the while loop) to identify known faces.

Can @whitphx @soft-nougat or someone else please help me out!

Thanks a ton in advance!

This is the error that I’m getting but the same code works fine locally

This is error on heroku log
2021-05-06T09:31:16.531348+00:00 app[web.1]: [INFO] loading encodings + face detector…

2021-05-06T09:31:16.625860+00:00 app[web.1]: [INFO] starting video stream…

2021-05-06T09:31:16.626472+00:00 app[web.1]: [ WARN:1] global /tmp/pip-req-build-ddpkm6fn/opencv/modules/videoio/src/cap_v4l.cpp (893) open VIDEOIO(V4L2:/dev/video0): can’t open camera by index

Link to the github repo of the app:

import streamlit as st
import numpy as np
import pickle
import time
import cv2
import imutils
from import VideoStream
from import FPS
import face_recognition

# Face Recognition App

#Initialize 'currentname' to trigger only when a new person is identified.
currentname = "unknown"
#Determine faces from encodings.pickle file model created from
encodingsP = "encodings.pickle"
#use this xml file
cascade = "haarcascade_frontalface_default.xml"

# load the known faces and embeddings along with OpenCV's Haar
# cascade for face detection
print("[INFO] loading encodings + face detector...")
st.write("[INFO] loading encodings + face detector...")

data = pickle.loads(open(encodingsP, "rb").read())
detector = cv2.CascadeClassifier(cascade)

# initialize the video stream and allow the camera sensor to warm up
print("[INFO] starting video stream...")
st.write("[INFO] starting video stream...")

vs = VideoStream(src=0).start()                      #this line
#vs = VideoStream(usePiCamera=True).start()

# start the FPS counter
fps = FPS().start()

# loop over frames from the video file stream
names = []
while True:
    # grab the frame from the threaded video stream and resize it
    # to 500px (to speedup processing)
    frame =                                               #this line 
    frame = imutils.resize(frame, width=500)

    # convert the input frame from (1) BGR to grayscale (for face
    # detection) and (2) from BGR to RGB (for face recognition)
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
    rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

    # detect faces in the grayscale frame
    rects = detector.detectMultiScale(gray, scaleFactor=1.1, 
        minNeighbors=5, minSize=(30, 30),

    # OpenCV returns bounding box coordinates in (x, y, w, h) order
    # but we need them in (top, right, bottom, left) order, so we
    # need to do a bit of reordering
    boxes = [(y, x + w, y + h, x) for (x, y, w, h) in rects]

    # compute the facial embeddings for each face bounding box
    encodings = face_recognition.face_encodings(rgb, boxes)
    #names = []

    # loop over the facial embeddings
    for encoding in encodings:
        # attempt to match each face in the input image to our known
        # encodings
        matches = face_recognition.compare_faces(data["encodings"],
        name = "Unknown" #if face is not recognized, then print Unknown

        # check to see if we have found a match
        if True in matches:
            # find the indexes of all matched faces then initialize a
            # dictionary to count the total number of times each face
            # was matched
            matchedIdxs = [i for (i, b) in enumerate(matches) if b]
            counts = {}

            # loop over the matched indexes and maintain a count for
            # each recognized face face
            for i in matchedIdxs:
                name = data["names"][i]
                counts[name] = counts.get(name, 0) + 1

            # determine the recognized face with the largest number
            # of votes (note: in the event of an unlikely tie Python
            # will select first entry in the dictionary)
            name = max(counts, key=counts.get)

            #If someone in your dataset is identified, print their name on the screen
            #if currentname != name:
                #currentname = name

        # update the list of names

    # loop over the recognized faces
    for ((top, right, bottom, left), name) in zip(boxes, names):
        # draw the predicted face name on the image - color is in BGR
        cv2.rectangle(frame, (left, top), (right, bottom),
            (0, 255, 0), 2)
        y = top - 15 if top - 15 > 15 else top + 15
        cv2.putText(frame, max(set(names), key = names.count), (left, y), cv2.FONT_HERSHEY_SIMPLEX,
            .8, (255, 0, 0), 2)

    # display the image to our screen
    cv2.imshow("Facial Recognition is Running", frame)
    key = cv2.waitKey(1) & 0xFF

    # quit when 'q' key is pressed
    if key == ord("q") or len(names)==30:

    # update the FPS counter

#print name of person identified and accuracy
#max([(name,names.count(name)) for name in set(names)],key=lambda x:x[1])
print("person identified : ", max(set(names), key = names.count))
print([(name,names.count(name)) for name in set(names)])


st.write("person identified : ", max(set(names), key = names.count))
st.write([(name,names.count(name)) for name in set(names)][0])

# stop the timer and display FPS information
print("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
print("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

st.write("[INFO] elasped time: {:.2f}".format(fps.elapsed()))
st.write("[INFO] approx. FPS: {:.2f}".format(fps.fps()))

# do a bit of cleanup

Hi @inkarar, welcome to the Streamlit community! :confetti_ball: :partying_face: :confetti_ball:

Could you also post any errors messages that you see in your logs on heroku? It might help folks on here understand the technical reasons as to why “opencv isn’t working:mag:

Happy Streamlit-ing! :balloon:

1 Like


I think you are subject to a common misunderstanding about how streamlit works.

With streamlit, all interaction must be done through the browser or browser APIs. Therefore, all interaction needs some streamlit component. Anything you try to do past that will fail at the latest when hosting streamlit on a server. Your application just happens to work on the local machine because the “server” and “client” are identical.

As far as I know you can’t use imutils.Videocapture together with streamlit this way. The problem is that it tries to access the video camera through the hardware of the computer. This works locally, but not on the server, because there is no connected video camera and not even a driver for it.


Thank you for your response. I understand what I am doing wrong here.

But as I understand streamlit-webrtc can access my webcam from heroku server

So i just need to be able to use streamlit-webrtc to capture frames from live stream and then find a match against a database.

I think this can be done via streamlit-webrtc and I’m asking for any and all help for the same becasue I’m new to streamlit and streamlit-webrtc. Thank you!!!

Follow the tutorial or read the code of the sample app, which I posted above in this thread. I think there is sufficient information to use this component.
In addition, there are many repos using streamlit-webrtc: Network Dependents · whitphx/streamlit-webrtc · GitHub You can also refer to the source code of those projects.

@Franky1 , thank you for your answer.
His answer is totally correct, and streamlit-webrtc has been developed to solve that problem. For the details about it, see the first post in this thread or the introduction part of this blog post


Hi community,

I released a new version of streamlit-webrtc, v0.20.0, with (experimental) audio support.
I added some samples to the example app which deal with audio streams:
However I don’t have experience of audio/signal processing and some samples are just copied from the web… I want someone who are familiar with this field to help create better examples or consider more useful API :bowing_man:

I also created a real time Speech-to-Text app: , which I think is very impressive example utilizing audio capability of this new version of streamlit-webrtc (The STT functionality is based on DeepSpeech).
(The generated text is not much precise probably because of my non-native English and sound environment such as microphone :sweat_smile: )
The source code is here.

For developers who have been using streamlit-webrtc, please note that some breaking changes have been introduced in this version. VideoTransformerBase class and its transform() method are now deprecated and will be removed. Please use VideoProcessorBase and its recv() instead. For further info, check


Ok now I reaaally need to delve into this :muscle: audio streams!

There’s a topic on this forum about Guitar Chord recognition I’ve been dying to solve forever: Selecting audio input or output device and I might finally be able to provide an answer :smiley:

Thanks for you hard work @whitphx

Fanilo :balloon:

1 Like