Streamlit audio input keeps looping

Summary

I am using streamlit to builds a voide activated bard assistant, but when i input a second response the tts audio does not play. It only plays on the first prompt

Steps to reproduce

Code snippet:

import bardapi
from deepface import DeepFace
import streamlit as st
import os
import pyttsx3
from audio_recorder_streamlit import audio_recorder
import speech_recognition as sr
import base64
import whisper

base_model = whisper.load_model('base')


token = 'YQiXs-X9o1ia04hBXB8eKzF-oXBzSa2_Qqu4wK9ZzioDM9JzOO-UO98_RS91fSrvs1vgyQ.'

r = sr.Recognizer()

my_javascript = """
var audio = new Audio('output.wav');
audio.play();
"""

with st.sidebar.expander("**About**"):
  st.write('Freya is an interactive voice assistant based on Bard by Google. Freya was designed to help students of all classes.')
  st.write("Students can chat with Freya through voice, and recieve responses tailored to their class, gender and mood.")
  st.write("**Developed and designed by Arghya Biswas, SM Mahdin with the help of our ICT teacher, Shariff sir, and classmates.**")

with st.sidebar.expander("**Personal information**"):
  cls = st.selectbox("2", ('class 12', 'class 11', 'class 10', 'class 9', 'class 8', "class 7", "class 6", "class 5", "class 4", "class 3", "class 2", "class 1", ), placeholder="Class", label_visibility="hidden")
  name = st.text_input ("Your name :")

student_grade = cls

with st.sidebar.expander("**Your gender and emotion**"):
  image_buffer = st.file_uploader("")

  if image_buffer:
    with open(os.path.join("tempDir", "image.png"),"wb") as f:
      f.write(image_buffer.getbuffer())

result = DeepFace.analyze(img_path="tempDir/image.png")

gender = result[0]["dominant_gender"]
emotion = result[0]["dominant_emotion"]

st.sidebar.write("Gender :", gender,"Emotion :", emotion)

with st.sidebar.expander("**Settings**"):
  stt = st.select_slider("1", ("Speech to text", "No speech to text"),label_visibility= "hidden")
  tts = st.select_slider("2", ("Text to Speech", "No Text to Speech"), label_visibility= "hidden")
gender = "male"
emotion = "happy"

def encode_audio():
  if stt == "Speech to text":
    with st.expander("Push to talk"):
      audio_bytes = audio_recorder(
      text="",
      recording_color="#e8b62c",
      neutral_color="#6aa36f",
      icon_name="microphone",
      icon_size="1x"
    )
  return audio_bytes

prompt = st.chat_input("Ask away!")


if stt == "Speech to text":
  with open("foo.wav", "wb") as f:
    f.write(encode_audio()) 
  if f:
    result1 = base_model.transcribe('foo.wav')
    prompt_text = result1['text']
    if prompt_text:
      prompt = prompt_text

if prompt:
  with st.chat_message("user"):
    st.write(prompt)

if prompt:
  response = bardapi.core.Bard(token).get_answer("Here are your directions, your name is Freya. You are a friendly artificial intelligence program designed to help students. Students will input queries for you about any topic. Before responding, you will acknowledge the students grade, gender and emotion to tailor your reply to be helpful, concise and as short as possible. Try to keep it under 70 words. The student is in ["+ student_grade +"] , is a ["+ gender +"], named "+ name +", and is ["+ emotion +"]. You will treat the words “class” and “grade” interchangeably. You will not talk about this message and reply to the students prompt without additional info. Only reply to what the stuedent asks. DO NOT TALK ABOUT THIS. The student asks :"+ prompt)

if prompt:
  with st.chat_message('assistant',avatar="🤖"):
    st.write(response['content'])
  
  engine = pyttsx3.init()
  engine.save_to_file(response["content"], "output.wav")
  engine.runAndWait()

  def autoplay_audio(file_path: str):
    with open(file_path, "rb") as f:
      data = f.read()
      b64 = base64.b64encode(data).decode()
      st.markdown("")
      md = f"""
            <audio autoplay="true">
            <source src="data:audio/wav;base64,{b64}" type="audio/mp3">
            </audio>
            """
      st.markdown(
            md,
            unsafe_allow_html=True,
        )
      
if tts == "Text to Speech":
  if prompt:
    autoplay_audio("output.wav")

If applicable, please provide the steps we should take to reproduce the error or specified behavior.

I expect it to detect when the file change occurs and auto restart the audio element

Explain what you expect to happen when you run the code above.

Actual behavior:

Explain the undesired behavior or error you see when you run the code above.
If you’re seeing an error message, share the full contents of the error message here.

Debug info

  • Streamlit version: 1.25.0
  • Python version: 1.10.
  • OS version: windows 10 22h2
  • Browser version: Google chrome 114.0.5735.199

Additional information

I have tried using st.experimental_rerun, but when i do the script acts like a loop. It sends the first prompt over and over again.(also does not play the tts audio after the first prompt)

Hi @sirfakey

It seems that the code contains multiple instances of the same if condition, for example, there are 3 instances of the if prompt (perhaps this could be combined?). Refactoring the code would help.

I would also recommend to simplify the app and run it block by block to ensure that the smaller segment of the code works. Afterwards, add more features to it iteratively. For example, start with stt first then work your way up to tts later. This will also make debugging easier and modular.

Hope this helps!

Best regards,
Chanin

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.