Speeach-to-Text query. Audio processing retriggers on every state change or user interaction with Streamlit app

Ebuka · November 16, 2024, 12:25pm

Environment Details:

Running locally
Python version: 3.10.12
Streamlit version: 1.39.0

Github repo: GitHub - Ebukachuqz/fabrizo-ai-rag-app: Developed a Retrieval-Augmented Generation (RAG) application using Langchain and Groq /OpenAI LLM models to build an AI-powered chatbot that delivers real-time football transfer insights based on Fabrizio Romano's scrapped tweets. LLM monitoring and evaluations are performed. Check README.md
Requirements.txt includes:

streamlit
audiorecorder
langchain-core

Issue Description: I’m building a chat application with audio recording capabilities using [audiorecorder](https://github.com/theevann/streamlit-audiorecorder). The issue is that my audio processing code runs on every state change or interaction, even when no new audio is recorded.

Code:

from audiorecorder import audiorecorder
import streamlit as st
from langchain_core.messages import AIMessage, HumanMessage
from src.speech2text import speech2text
import os

# Initialize chat history
if "chat_history" not in st.session_state:
    st.session_state.chat_history = [
        AIMessage(content="Initial message"),
    ]

# User input methods
user_query = st.chat_input("Ask a question...")
audio = audiorecorder(start_prompt="", stop_prompt="", pause_prompt="", show_visualizer=False)

# Process user input
if user_query or len(audio) > 0:
    if len(audio) > 0:  # This condition triggers on every Streamlit rerun
        audio.export("audio_query.wav", format="wav")
        with st.chat_message("Human"):
            transcribed_text = speech2text("audio_query.wav")
            st.session_state.chat_history.append(HumanMessage(content=transcribed_text))
            user_query = transcribed_text
            os.remove("audio_query.wav")
    
    elif user_query:
        st.session_state.chat_history.append(HumanMessage(content=user_query))
   else:
        st.stop()

    with st.chat_message("AI"):
        response_container = st.empty()
        response_text = ""

        try:
            with st.spinner("Generating response..."):
                full_response, chunks, urls = rag(user_query, llm_choice, api_key)

            with st.spinner("Generating audio..."):
                audio_file = text2speech(remove_emojis(full_response), filename="audio_response.mp3")
            if audio_file:
                autoplay_audio("audio_response.mp3")

Current Behavior:

Every time I interact with any element in my Streamlit app (clicking buttons, submitting feedback, etc.), the if len(audio) > 0 check runs again
This causes unnecessary audio processing and file operations
I can see this happening because:

The file operations occur repeatedly
The speech-to-text conversion is triggered multiple times
The chat history gets updated even without new audio input

Expected Behavior:

Audio processing should only trigger when new audio is actually recorded
Other interactions with the app (button clicks, state changes) shouldn’t trigger the audio processing logic

Questions:

How can I modify my code to only process audio when new audio is actually recorded?
What’s the recommended way to handle audio state persistence in Streamlit?
Is there a best practice for managing audio recording state to prevent unnecessary reruns?

What I’ve Tried:

Using session state to track audio status (but unsure of the correct approach)
The issue seems related to Streamlit’s reactive rerun behavior

No error messages are being generated - this is a logic/state management issue rather than an error.

Encrypt · November 18, 2024, 10:29am

To address your issue, I have implemented a session state variable st.session_state.last_audio to maintain track of the previously recorded audio. Subsequently, I compare this value with the audio in each render to determine if it is a new or previously recorded audio.

from audiorecorder import audiorecorder
import streamlit as st
from langchain_core.messages import AIMessage, HumanMessage
import os

# Mock speech2text function
def speech2text(audio_path):
    print(f"Converting audio to text: {audio_path}")
    return "This is a mock transcription"

# Initialize chat history
if "chat_history" not in st.session_state:
    st.session_state.chat_history = [
        AIMessage(content="Initial message"),
    ]

if "last_audio" not in st.session_state:
    st.session_state.last_audio = None

# User input methods
user_query = st.chat_input("Ask a question...")
audio = audiorecorder(start_prompt="", stop_prompt="", pause_prompt="", show_visualizer=False, key="audio")

# Render chat history
for message in st.session_state.chat_history:
    if isinstance(message, AIMessage):
        with st.chat_message("AI"):
            st.write(message.content)
    elif isinstance(message, HumanMessage):
        with st.chat_message("Human"):
            st.write(message.content)

# Check if new audio is available
def check_new_audio():
    global audio

    if len(audio) == 0:
        return False

    if audio == st.session_state.last_audio:
        return False
    else:
        st.session_state.last_audio = audio
        return True

# Process user input
is_new_audio = check_new_audio()
if user_query or is_new_audio:
    if is_new_audio:
        audio.export("audio_query.wav", format="wav")
        user_query = speech2text("audio_query.wav")
        os.remove("audio_query.wav")
    elif user_query:
        pass
    else:
        st.stop()

    with st.chat_message('human'):
        st.write(user_query)

    st.session_state.chat_history.append(HumanMessage(content=user_query))
    st.session_state.chat_history.append(AIMessage(content="Processing..."))

    with st.chat_message("AI"):
        st.markdown("Processing...")

Ebuka · November 18, 2024, 11:52am

Hi @Encrypt Thank you so much for responding. This is very brilliant, I really appreciate taking your time to review the code

system · November 20, 2024, 11:52am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
App rerun on a component hit (even after involving session state) streamlit-webrtc session-state , rerun	0	358	February 13, 2024
How to clean variables for chatbot Using Streamlit	3	413	May 13, 2024
Streamlit rerun problem! Using Streamlit	4	1071	October 28, 2023
Streaming audio response from streamlit app Community Cloud debugging	1	1279	September 27, 2024
Chat input widget that supports both text and audio Community Cloud discussion	6	1327	March 24, 2025

Speeach-to-Text query. Audio processing retriggers on every state change or user interaction with Streamlit app

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies