Real-Time Speech-to-Text Using Browser Microphone and Azure Web App in Streamlit

fschopper · October 29, 2024, 6:30am

Hi everyone,

I’m working on a project to implement real-time speech transcription in Streamlit, hosted on an Azure Web App (Linux). While everything works smoothly on my local machine—where Python can access the system microphone directly—the setup breaks down once deployed on Azure Web App, since there’s no native microphone access in the cloud environment.

What I’m Trying to Achieve:

Real-Time Transcription: Capture audio from the user’s browser microphone, process it in real-time, and display the transcription in Streamlit.
Streaming Approach: Ideally, I’d like a solution that streams the audio directly from the browser to the backend, maintaining as close to real-time transcription as possible.
Setup Details: Azure Web App is hosting both the Streamlit frontend and the API that leverages Azure Cognitive Services for speech-to-text.

Challenges:

I’ve found that accessing the microphone through JavaScript in the browser seems to be the only option in a cloud deployment, but integrating JavaScript with Python in Streamlit for continuous streaming poses a challenge. Here’s what I’ve explored so far:

JS and Python Combo: Exploring ways to capture the audio via JavaScript, send it to the backend (potentially through WebSockets), and then display transcription results in Streamlit.
Previous Workarounds: I’ve looked into older libraries like streamlit-bokeh-events and streamlit-mic-recorder, but these either seem outdated or don’t fully support real-time streaming and continuous audio transmission, which are essential for this project.

Question:

Has anyone successfully implemented browser-based microphone access in Streamlit for real-time applications, particularly on a cloud deployment like Azure? I’d appreciate any suggestions for handling the audio capture, streaming, and display without rerun issues or significant lag.

Any help or guidance would be hugely appreciated!

Cheers,
Florian

muffin · March 26, 2025, 12:03pm

Hi there, this might not be exactly what you are looking for, but I’ve had great sucess with using the Web Speech API. This completely solves the realtime s-t-t on the client side, since it’s browser based speech recognition.

Topic		Replies	Views
Help with using pyaudio Community Cloud audio , pyaudio	7	2986	March 26, 2023
New component : streamlit-mic-recorder, designed for easy speech to text implementation Custom Components audio	23	19261	March 24, 2025
Session Storage plus Azure help Using Streamlit session-state , azure , speech-to-text , debugging	1	62	December 4, 2024
Recording user Audio when deployed Community Cloud	3	2759	July 15, 2023
Docker Remote Deployment with Sound Deployment audio	4	2043	February 26, 2024

Real-Time Speech-to-Text Using Browser Microphone and Azure Web App in Streamlit

What I’m Trying to Achieve:

Challenges:

Question:

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies