Speeach-to-Text query. Audio processing retriggers on every state change or user interaction with Streamlit app

Environment Details:

  • Running locally
  • Python version: 3.10.12
  • Streamlit version: 1.39.0

Github repo: GitHub - Ebukachuqz/fabrizo-ai-rag-app: Developed a Retrieval-Augmented Generation (RAG) application using Langchain and Groq /OpenAI LLM models to build an AI-powered chatbot that delivers real-time football transfer insights based on Fabrizio Romano's scrapped tweets. LLM monitoring and evaluations are performed. Check README.md
Requirements.txt includes:

streamlit
audiorecorder
langchain-core

Issue Description: I’m building a chat application with audio recording capabilities using [audiorecorder](https://github.com/theevann/streamlit-audiorecorder). The issue is that my audio processing code runs on every state change or interaction, even when no new audio is recorded.

Code:

from audiorecorder import audiorecorder
import streamlit as st
from langchain_core.messages import AIMessage, HumanMessage
from src.speech2text import speech2text
import os

# Initialize chat history
if "chat_history" not in st.session_state:
    st.session_state.chat_history = [
        AIMessage(content="Initial message"),
    ]

# User input methods
user_query = st.chat_input("Ask a question...")
audio = audiorecorder(start_prompt="", stop_prompt="", pause_prompt="", show_visualizer=False)

# Process user input
if user_query or len(audio) > 0:
    if len(audio) > 0:  # This condition triggers on every Streamlit rerun
        audio.export("audio_query.wav", format="wav")
        with st.chat_message("Human"):
            transcribed_text = speech2text("audio_query.wav")
            st.session_state.chat_history.append(HumanMessage(content=transcribed_text))
            user_query = transcribed_text
            os.remove("audio_query.wav")
    
    elif user_query:
        st.session_state.chat_history.append(HumanMessage(content=user_query))
   else:
        st.stop()

    with st.chat_message("AI"):
        response_container = st.empty()
        response_text = ""

        try:
            with st.spinner("Generating response..."):
                full_response, chunks, urls = rag(user_query, llm_choice, api_key)

            with st.spinner("Generating audio..."):
                audio_file = text2speech(remove_emojis(full_response), filename="audio_response.mp3")
            if audio_file:
                autoplay_audio("audio_response.mp3")

Current Behavior:

  1. Every time I interact with any element in my Streamlit app (clicking buttons, submitting feedback, etc.), the if len(audio) > 0 check runs again
  2. This causes unnecessary audio processing and file operations
  3. I can see this happening because:
  • The file operations occur repeatedly
  • The speech-to-text conversion is triggered multiple times
  • The chat history gets updated even without new audio input

Expected Behavior:

  • Audio processing should only trigger when new audio is actually recorded
  • Other interactions with the app (button clicks, state changes) shouldn’t trigger the audio processing logic

Questions:

  1. How can I modify my code to only process audio when new audio is actually recorded?
  2. What’s the recommended way to handle audio state persistence in Streamlit?
  3. Is there a best practice for managing audio recording state to prevent unnecessary reruns?

What I’ve Tried:

  • Using session state to track audio status (but unsure of the correct approach)
  • The issue seems related to Streamlit’s reactive rerun behavior

No error messages are being generated - this is a logic/state management issue rather than an error.

To address your issue, I have implemented a session state variable st.session_state.last_audio to maintain track of the previously recorded audio. Subsequently, I compare this value with the audio in each render to determine if it is a new or previously recorded audio.

from audiorecorder import audiorecorder
import streamlit as st
from langchain_core.messages import AIMessage, HumanMessage
import os

# Mock speech2text function
def speech2text(audio_path):
    print(f"Converting audio to text: {audio_path}")
    return "This is a mock transcription"

# Initialize chat history
if "chat_history" not in st.session_state:
    st.session_state.chat_history = [
        AIMessage(content="Initial message"),
    ]

if "last_audio" not in st.session_state:
    st.session_state.last_audio = None

# User input methods
user_query = st.chat_input("Ask a question...")
audio = audiorecorder(start_prompt="", stop_prompt="", pause_prompt="", show_visualizer=False, key="audio")

# Render chat history
for message in st.session_state.chat_history:
    if isinstance(message, AIMessage):
        with st.chat_message("AI"):
            st.write(message.content)
    elif isinstance(message, HumanMessage):
        with st.chat_message("Human"):
            st.write(message.content)

# Check if new audio is available
def check_new_audio():
    global audio

    if len(audio) == 0:
        return False

    if audio == st.session_state.last_audio:
        return False
    else:
        st.session_state.last_audio = audio
        return True

# Process user input
is_new_audio = check_new_audio()
if user_query or is_new_audio:
    if is_new_audio:
        audio.export("audio_query.wav", format="wav")
        user_query = speech2text("audio_query.wav")
        os.remove("audio_query.wav")
    elif user_query:
        pass
    else:
        st.stop()

    with st.chat_message('human'):
        st.write(user_query)

    st.session_state.chat_history.append(HumanMessage(content=user_query))
    st.session_state.chat_history.append(AIMessage(content="Processing..."))

    with st.chat_message("AI"):
        st.markdown("Processing...")
1 Like

Hi @Encrypt Thank you so much for responding. This is very brilliant, I really appreciate taking your time to review the code

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.