Real-Time Speech-to-Text Using Browser Microphone and Azure Web App in Streamlit

Hi everyone,

I’m working on a project to implement real-time speech transcription in Streamlit, hosted on an Azure Web App (Linux). While everything works smoothly on my local machine—where Python can access the system microphone directly—the setup breaks down once deployed on Azure Web App, since there’s no native microphone access in the cloud environment.

What I’m Trying to Achieve:

  • Real-Time Transcription: Capture audio from the user’s browser microphone, process it in real-time, and display the transcription in Streamlit.
  • Streaming Approach: Ideally, I’d like a solution that streams the audio directly from the browser to the backend, maintaining as close to real-time transcription as possible.
  • Setup Details: Azure Web App is hosting both the Streamlit frontend and the API that leverages Azure Cognitive Services for speech-to-text.

Challenges:

I’ve found that accessing the microphone through JavaScript in the browser seems to be the only option in a cloud deployment, but integrating JavaScript with Python in Streamlit for continuous streaming poses a challenge. Here’s what I’ve explored so far:

  • JS and Python Combo: Exploring ways to capture the audio via JavaScript, send it to the backend (potentially through WebSockets), and then display transcription results in Streamlit.
  • Previous Workarounds: I’ve looked into older libraries like streamlit-bokeh-events and streamlit-mic-recorder, but these either seem outdated or don’t fully support real-time streaming and continuous audio transmission, which are essential for this project.

Question:

Has anyone successfully implemented browser-based microphone access in Streamlit for real-time applications, particularly on a cloud deployment like Azure? I’d appreciate any suggestions for handling the audio capture, streaming, and display without rerun issues or significant lag.

Any help or guidance would be hugely appreciated!

Cheers,
Florian