The chatbox utility for Streamlit could use some improvement. Can you tell me if these options exist or not? I would prefer the chatbox be able to be detached from the pages/session-state with it being able to know which pages the user is on. In addition, I have problems with using voice activation. I am using Claude Code and here is what has been tried…..
Voice Activation in Streamlit Chat - Technical Limitation Report
What We Attempted
We tried to implement voice input for the AI Copilot chatbox using the Web Speech API (browser-native speech recognition). The implementation consisted of:
-
shared/voice_recognition.py - A custom Streamlit component using
streamlit.components.v1.html()that embeds JavaScript for the Web Speech API -
shared/text_to_speech.py - Text-to-speech output (this part works)
The Core Problem
Streamlit’s st.chat_input() cannot be programmatically populated or triggered from external sources.
When voice recognition captures the user’s speech, we need to:
-
Get the transcript from JavaScript (Web Speech API)
-
Send it to Python/Streamlit
-
Insert it into the chat input OR submit it as if the user typed it
Step 3 is impossible with the current st.chat_input() API.
Technical Details
Our JavaScript Component
// In shared/voice_recognition.py
recognition.onresult = function(event) {
const transcript = event.results[0][0].transcript;
// We can capture the speech successfully
// But we cannot inject it into st.chat_input()
// This sends data to Streamlit:
window.parent.postMessage({
type: 'streamlit:setComponentValue',
value: transcript
}, '*');
};
The Workaround We Had to Use
Instead of integrating with st.chat_input(), we had to create a separate voice button that bypasses the chat input entirely:
# In copilot_widget.py - the workaround
col_voice, col_text = st.columns([1, 5])
with col_voice:
voice_transcript = render_voice_input(key="voice") # Separate button
with col_text:
text_input = st.chat_input("Type here...") # Normal chat input
# Process voice separately from text
if voice_transcript:
# Manually add to chat log and process
st.session_state.chat_log.append({"role": "user", "content": f"🎤 {voice_transcript}"})
# ... process with AI
Why This Is Suboptimal
-
Two separate input methods - Users see a voice button AND a text input, which is confusing UX
-
No unified input experience - Voice transcript doesn’t appear in the chat input field
-
Cannot pre-fill chat input - Users can’t review/edit voice transcript before sending
-
Layout constraints - Voice button must be placed separately (can’t be inside
st.chat_input)
What We Need from Streamlit
Option A: Programmatic st.chat_input() Control
# Desired API - allow setting chat input value programmatically
st.chat_input("Message...", value=st.session_state.voice_transcript)
# Or a callback that can inject text
st.chat_input("Message...", on_voice_input=handle_voice_transcript)
Option B: st.chat_input() with Voice Button Built-in
# Desired API - built-in voice support
st.chat_input(
"Message...",
enable_voice=True, # Adds microphone icon
voice_language="en-US"
)
Option C: Allow Custom Widgets Inside st.chat_input()
# Desired API - compose custom elements inside chat input
with st.chat_input("Message...") as chat:
chat.add_button("🎤", on_click=start_voice_recognition)
chat.add_button("📎", on_click=attach_file)
Browser Compatibility Note
The Web Speech API works in:
-
Chrome/Edge (Chromium-based) -
Safari -
Firefox (no support)
This is a browser limitation, not Streamlit’s fault. However, Streamlit could still provide the integration hooks for browsers that support it.
Files for Reference
| File | Purpose |
|---|---|
| shared/voice_recognition.py | Web Speech API component (works, but can’t inject into chat) |
| shared/text_to_speech.py | TTS output (works) |
| VOICE_RECOGNITION_GUIDE.md | Full documentation of the attempt |
Summary
The voice recognition itself works. The limitation is that st.chat_input() is a closed component that cannot:
-
Accept programmatic input values
-
Be triggered/submitted from external JavaScript
-
Contain custom widgets (like a microphone button)
This forces developers to create awkward workarounds with separate voice buttons, breaking the unified chat UX that users expect from modern AI assistants.