Speech To Text On Client Side Using HTML5 and Streamlit Bokeh Events

Hi Guys,

I just came across this question Audio display and thought to myself what if we want to do the opposite :laughing:, Its pretty straight forward if you are listening through mic where streamlit server is hosted but it gets a little tricky if you want to do it on client side.
Worry not. Javascript to the rescue,

Checkout this ( buggy :sweat: ) snippet, that does just this !

import streamlit as st
from bokeh.models.widgets import Button
from bokeh.models import CustomJS
from streamlit_bokeh_events import streamlit_bokeh_events

stt_button = Button(label="Speak", width=100)

stt_button.js_on_event("button_click", CustomJS(code="""
    var recognition = new webkitSpeechRecognition();
    recognition.continuous = true;
    recognition.interimResults = true;
 
    recognition.onresult = function (e) {
        var value = "";
        for (var i = e.resultIndex; i < e.results.length; ++i) {
            if (e.results[i].isFinal) {
                value += e.results[i][0].transcript;
            }
        }
        if ( value != "") {
            document.dispatchEvent(new CustomEvent("GET_TEXT", {detail: value}));
        }
    }
    recognition.start();
    """))

result = streamlit_bokeh_events(
    stt_button,
    events="GET_TEXT",
    key="listen",
    refresh_on_update=False,
    override_height=75,
    debounce_time=0)

if result:
    if "GET_TEXT" in result:
        st.write(result.get("GET_TEXT"))

snippet

Hope you guys will find it interesting !

PS. In GIF thats STT of me speaking :laughing:

5 Likes

Hey @ash2shukla!

Thats super cool, so you press this button and then speak into your mic? Do you need an external mic (like on a pair of headphones) or would it work with your computers builtin mic?

(side note: I love how you almost always put gifs in your posts, it makes them so easy to read and understand!)

1 Like

Hey @Marisa_Smith!

It works with any mic input :slight_smile:

( and yay! thanks ! haha )

1 Like

Thank you so much that is very cool,
in the same way, could you just convert text to voice without cliking in button ?

Hey @lazeni,

You could do this to do TTS instead of STT,

import streamlit as st
from bokeh.models.widgets import Button
from bokeh.models import CustomJS

text = st.text_input("Say what ?")

tts_button = Button(label="Speak", width=100)

tts_button.js_on_event("button_click", CustomJS(code=f"""
    var u = new SpeechSynthesisUtterance();
    u.text = "{text}";
    u.lang = 'en-US';

    speechSynthesis.speak(u);
    """))

st.bokeh_chart(tts_button)

But the problem lies here as well, you need to click the speak button. :frowning:

IMHO, it looks a little more neat and it doesn’t have addition gTTS dependencies…

1 Like

Amazing @ash2shukla here is the output on my side :laughing:
Captura de pantalla 2020-12-09 a la(s) 20.30.09

2 Likes

:laughing: awesome !

1 Like

Hey @ash2shukla, I was wondering if something similar could work for using the webcam. What do you think?

Hey @napoles3d, I haven’t tried it recently but I think webcam and mic won’t work due to sandboxing of components.
The two things I’ve been able to make work were, speech and geolocation APIs…
Let me know if webcam works !

1 Like

mhhh, I see… someone else tried but didn’t work (Big correction, It actually works!! sorry @Luke :sweat_smile:) :

1 Like

Hi, nice work, how can i save the audio file?, i will aprecciate your advice, thanks so much! :slight_smile:

@ash2shukla How do I go about accessing all of the text that was uttered to process it (not just the intermediate outputs)? Thank you!

@ash2shukla I want to implement this app using streamlit in google colab. So please tell me how can I implement this

it seems that you can change these two parameters:

recognition.continuous = true;
recognition.interimResults = true;

By changing there state you will be able to avoid intermediate results and continuous recording.