Speech To Text On Client Side Using HTML5 and Streamlit Bokeh Events

ash2shukla · December 9, 2020, 7:38pm

Hi Guys,

I just came across this question Audio display and thought to myself what if we want to do the opposite , Its pretty straight forward if you are listening through mic where streamlit server is hosted but it gets a little tricky if you want to do it on client side.
Worry not. Javascript to the rescue,

Checkout this ( buggy ) snippet, that does just this !

import streamlit as st
from bokeh.models.widgets import Button
from bokeh.models import CustomJS
from streamlit_bokeh_events import streamlit_bokeh_events

stt_button = Button(label="Speak", width=100)

stt_button.js_on_event("button_click", CustomJS(code="""
    var recognition = new webkitSpeechRecognition();
    recognition.continuous = true;
    recognition.interimResults = true;
 
    recognition.onresult = function (e) {
        var value = "";
        for (var i = e.resultIndex; i < e.results.length; ++i) {
            if (e.results[i].isFinal) {
                value += e.results[i][0].transcript;
            }
        }
        if ( value != "") {
            document.dispatchEvent(new CustomEvent("GET_TEXT", {detail: value}));
        }
    }
    recognition.start();
    """))

result = streamlit_bokeh_events(
    stt_button,
    events="GET_TEXT",
    key="listen",
    refresh_on_update=False,
    override_height=75,
    debounce_time=0)

if result:
    if "GET_TEXT" in result:
        st.write(result.get("GET_TEXT"))

snippet

Hope you guys will find it interesting !

PS. In GIF thats STT of me speaking

Marisa_Smith · December 9, 2020, 7:53pm

Hey @ash2shukla!

Thats super cool, so you press this button and then speak into your mic? Do you need an external mic (like on a pair of headphones) or would it work with your computers builtin mic?

(side note: I love how you almost always put gifs in your posts, it makes them so easy to read and understand!)

ash2shukla · December 9, 2020, 7:56pm

Hey @Marisa_Smith!

It works with any mic input

( and yay! thanks ! haha )

lazeni · December 9, 2020, 8:03pm

Thank you so much that is very cool,
in the same way, could you just convert text to voice without cliking in button ?

ash2shukla · December 9, 2020, 8:07pm

Hey @lazeni,

You could do this to do TTS instead of STT,

import streamlit as st
from bokeh.models.widgets import Button
from bokeh.models import CustomJS

text = st.text_input("Say what ?")

tts_button = Button(label="Speak", width=100)

tts_button.js_on_event("button_click", CustomJS(code=f"""
    var u = new SpeechSynthesisUtterance();
    u.text = "{text}";
    u.lang = 'en-US';

    speechSynthesis.speak(u);
    """))

st.bokeh_chart(tts_button)

But the problem lies here as well, you need to click the speak button.

IMHO, it looks a little more neat and it doesn’t have addition gTTS dependencies…

napoles3d · December 10, 2020, 3:32am

Amazing @ash2shukla here is the output on my side
Captura de pantalla 2020-12-09 a la(s) 20.30.09

ash2shukla · December 10, 2020, 7:15am

awesome !

napoles3d · December 17, 2020, 2:20am

Hey @ash2shukla, I was wondering if something similar could work for using the webcam. What do you think?

ash2shukla · December 17, 2020, 11:29am

Hey @napoles3d, I haven’t tried it recently but I think webcam and mic won’t work due to sandboxing of components.
The two things I’ve been able to make work were, speech and geolocation APIs…
Let me know if webcam works !

napoles3d · December 17, 2020, 3:12pm

mhhh, I see… someone else tried but didn’t work (Big correction, It actually works!! sorry @Luke ) :

oscar13ud · March 15, 2021, 10:34pm

Hi, nice work, how can i save the audio file?, i will aprecciate your advice, thanks so much!

asohn123 · March 22, 2021, 8:08pm

@ash2shukla How do I go about accessing all of the text that was uttered to process it (not just the intermediate outputs)? Thank you!

Jishnu_Nair · June 1, 2021, 6:56am

@ash2shukla I want to implement this app using streamlit in google colab. So please tell me how can I implement this

ChainYo · September 30, 2021, 9:35am

it seems that you can change these two parameters:

recognition.continuous = true;
recognition.interimResults = true;

By changing there state you will be able to avoid intermediate results and continuous recording.

iliboy · February 8, 2022, 7:40pm

ash2shukla:

import streamlit as st
from bokeh.models.widgets import Button
from bokeh.models import CustomJS
from streamlit_bokeh_events import streamlit_bokeh_events

stt_button = Button(label="Speak", width=100)

stt_button.js_on_event("button_click", CustomJS(code="""
    var recognition = new webkitSpeechRecognition();
    recognition.continuous = true;
    recognition.interimResults = true;
 
    recognition.onresult = function (e) {
        var value = "";
        for (var i = e.resultIndex; i < e.results.length; ++i) {
            if (e.results[i].isFinal) {
                value += e.results[i][0].transcript;
            }
        }
        if ( value != "") {
            document.dispatchEvent(new CustomEvent("GET_TEXT", {detail: value}));
        }
    }
    recognition.start();
    """))

result = streamlit_bokeh_events(
    stt_button,
    events="GET_TEXT",
    key="listen",
    refresh_on_update=False,
    override_height=75,
    debounce_time=0)

if result:
    if "GET_TEXT" in result:
        st.write(result.get("GET_TEXT"))

This is pretty nice. Is it possible to add timer to this, so to record for 10 seconds for example and and save the text in a list? Sorry I am new to CustomJS(). Or is it possible to have a stop button also - so when the speech is done, one gets the text for output?

BeyondMyself · February 10, 2022, 2:23pm

ash2shukla:

import streamlit as st
from bokeh.models.widgets import Button
from bokeh.models import CustomJS
from streamlit_bokeh_events import streamlit_bokeh_events

stt_button = Button(label="Speak", width=100)

stt_button.js_on_event("button_click", CustomJS(code="""
    var recognition = new webkitSpeechRecognition();
    recognition.continuous = true;
    recognition.interimResults = true;
 
    recognition.onresult = function (e) {
        var value = "";
        for (var i = e.resultIndex; i < e.results.length; ++i) {
            if (e.results[i].isFinal) {
                value += e.results[i][0].transcript;
            }
        }
        if ( value != "") {
            document.dispatchEvent(new CustomEvent("GET_TEXT", {detail: value}));
        }
    }
    recognition.start();
    """))

result = streamlit_bokeh_events(
    stt_button,
    events="GET_TEXT",
    key="listen",
    refresh_on_update=False,
    override_height=75,
    debounce_time=0)

if result:
    if "GET_TEXT" in result:
        st.write(result.get("GET_TEXT"))

This function supports English very good.
If I want to let it recognize other languages like Chinese, do you have some ideas?

guozhiwei.8210 · October 4, 2023, 3:00am

@BeyondMyself You can set the recognition.lang = ‘cmn-Hans-CN’;

guozhiwei.8210 · October 4, 2023, 3:07am

@ash2shukla Thanks for your idea and share the code, I have two questions to ask:

How to add punctuation marks based on the recognized content?
How to add a speech recognition end event to this button?

saisowhit_P_B · March 13, 2024, 12:26am

How to add a mike option in the text area?

Stefano · January 10, 2025, 4:56pm

Hi,
These are the versione of the libs:

bokeh==3.6.2
streamlit==1.41.1
streamlit-audiorec==0.1.3
streamlit-bokeh-events==0.1.2
streamlit-js-eval==0.1.7
streamlit-webrtc==0.47.7
streamlit_mic_recorder==0.0.8

I trying to use Your code, but when i run it, I don’t see anything on streamlit and in the console I see the error:

createForOfIteratorHelper.js:3 Uncaught (in promise) TypeError: undefined is not iterable (cannot read property Symbol(Symbol.iterator))
    at r (createForOfIteratorHelper.js:3:40)
    at Function.value (document.js:283:75)
    at Function.value (document.js:525:37)
    at index.js:40:32
    at l (runtime.js:45:40)
    at Generator._invoke (runtime.js:274:22)
    at forEach.e.<computed> [as next] (runtime.js:97:21)
    at i (asyncToGenerator.js:3:20)
    at s (asyncToGenerator.js:25:9)
    at asyncToGenerator.js:32:7

Can You explain to me Why?

Thanks

Stefano G.

Topic		Replies	Views
Recording user Audio when deployed Community Cloud	3	2747	July 15, 2023
New component : streamlit-mic-recorder, designed for easy speech to text implementation Custom Components audio	23	18719	March 24, 2025
Speech to text in st.chat_input Using Streamlit	3	11336	February 17, 2024
Component for bi-directional communication with bokeh Custom Components	56	12735	October 12, 2023
Streamlit not able to access client mic while implementing STT Using Streamlit pyaudio , debugging	22	776	October 4, 2024

Speech To Text On Client Side Using HTML5 and Streamlit Bokeh Events

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies