Open ai whisper doesn't recognize audio file recorded with audio_recorder_streamlit component

Hi, I am creating a streamlit app that through voice recording books my appointments and ‘‘save them’’ on my google calendar. I use OpenAI whisper to transcribe the audio to text first and then open ai chat completions to make the LLM extracts the appointments key details to be ‘‘sent’’ to my google calendar.

I use the commponent ''audio_recorder_streamlit" to record my voice…

when I launch the recording it records correctly, but it doesnt transcribe into text because it gives me this error: “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe0 in position 5: invalid continuation byte”. I am not sure if it is an issue with OpenAI whisper API that doesn’t recognize the format of the audio file recorded with audio_recorder_streamlit. I literally tried everything, but nothing is working.

Here below the code:

`# streamlit basic app front end

import streamlit as st
import openai_integration as oi # function to transcribe the text with OpenAI Whisper
from google_calendar import create_event
from audio_recorder_streamlit import audio_recorder

st.title(“AI Personal Assistant”)

Record and transcribe voice input

st.success(‘Hello click the voice recorder button below to start recording’)

audio_file = audio_recorder()

play the audio and transcribe

if audio_file is not None:
st.write(‘Play the audio’)
st.audio(audio_file, format=‘audio/wav’)
# transcribe the audio to text using the openai_integration.py function
st.write(“Transcribing audio to text…”)
text = oi.voice_to_text(audio_file) # function to transcribe the audio (code below)

openai_integration function to transcribe the text

from dotenv import load_dotenv
from openai import OpenAI
import os

Load the environment variables

load_dotenv()

Initialize the OpenAI API client

client = OpenAI(api_key=os.getenv(“OPEN_AI_API”))
def voice_to_text(audio_path):
# Read the contents of the uploaded file
with open(audio_path, “rb”) as audio_file:
# Send the audio content directly to OpenAI
response = client.audio.transcriptions.create(
model=“whisper-1”,
file=audio_file
)
return response.text
`

Hope somebody can help. Thanks all!

Hi Marcello,

I’m sorry to hear you’re running into this issue. I have some good news and bad news to share. The good news is that we are shipping a native streamlit component st.audio_input in version 1.39.0 that should allow exactly what you’re looking for. (The bad news being this is still a few weeks away.) Hope this helps!

1 Like

Thanks a lot Nico! it is definitely a good news. It is fine to wait :slight_smile: Do you think that is definitely an issue of audio_recorder_streamlit component and whisper then? in this tutorial here: Youtube tutorial personal assistant at minute: 8:56 it looks fine, very strange. Thanks again!

Hey @Marcello :wave:

Given the error you’ve shared, I suspect it’s neither an issue with the custom component nor whisper. It seems like there’s an issue with reading in the audio file, which might indicate a problem with how the audio bytes data from the custom component was encoded and saved to a file.

As a workaround (I wouldn’t recommend using this in prod) you could give st.audio_input a try by installing the whl from the PR currently undergoing code review:

pip install https://core-previews.s3-us-west-2.amazonaws.com/pr-9404/streamlit-1.38.0-py2.py3-none-any.whl

Here’s a sample script that takes audio data returned by st.audio_input, saves it as a NamedTemporaryFile, and passes that file to the OpenAI translations API:

import tempfile
from openai import OpenAI
import streamlit as st
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(api_key=os.getenv(“OPEN_AI_API”))

audio = st.audio_input(label="Record some audio to translate to English using OpenAI")

if audio:
    with tempfile.NamedTemporaryFile(delete=True, suffix=".wav") as f:
        # The translation API requires a file, 
        # so we write the audio data to a temporary file
        f.write(audio.getvalue())
        # open and translate the file
        audio_file = open(f.name, "rb")
        translation = client.audio.translations.create(
        model="whisper-1", 
        file=audio_file,
        prompt="Translate the following audio to English. First determine the language of the audio and then translate it to English.",
        )
        st.write(translation.text)
        f.close()

Thanks a lot, sorry for my late answer. Marcello