Open ai whisper doesn't recognize audio file recorded with audio_recorder_streamlit component

Marcello · September 9, 2024, 10:02pm

Hi, I am creating a streamlit app that through voice recording books my appointments and ‘‘save them’’ on my google calendar. I use OpenAI whisper to transcribe the audio to text first and then open ai chat completions to make the LLM extracts the appointments key details to be ‘‘sent’’ to my google calendar.

I use the commponent ''audio_recorder_streamlit" to record my voice…

when I launch the recording it records correctly, but it doesnt transcribe into text because it gives me this error: “UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xe0 in position 5: invalid continuation byte”. I am not sure if it is an issue with OpenAI whisper API that doesn’t recognize the format of the audio file recorded with audio_recorder_streamlit. I literally tried everything, but nothing is working.

Here below the code:

`# streamlit basic app front end

import streamlit as st
import openai_integration as oi # function to transcribe the text with OpenAI Whisper
from google_calendar import create_event
from audio_recorder_streamlit import audio_recorder

st.title(“AI Personal Assistant”)

Record and transcribe voice input

st.success(‘Hello click the voice recorder button below to start recording’)

audio_file = audio_recorder()

play the audio and transcribe

if audio_file is not None:
st.write(‘Play the audio’)
st.audio(audio_file, format=‘audio/wav’)
# transcribe the audio to text using the openai_integration.py function
st.write(“Transcribing audio to text…”)
text = oi.voice_to_text(audio_file) # function to transcribe the audio (code below)

openai_integration function to transcribe the text

from dotenv import load_dotenv
from openai import OpenAI
import os

Load the environment variables

load_dotenv()

Initialize the OpenAI API client

client = OpenAI(api_key=os.getenv(“OPEN_AI_API”))
def voice_to_text(audio_path):
# Read the contents of the uploaded file
with open(audio_path, “rb”) as audio_file:
# Send the audio content directly to OpenAI
response = client.audio.transcriptions.create(
model=“whisper-1”,
file=audio_file
)
return response.text
`

Hope somebody can help. Thanks all!

Nico · September 9, 2024, 11:10pm

Hi Marcello,

I’m sorry to hear you’re running into this issue. I have some good news and bad news to share. The good news is that we are shipping a native streamlit component st.audio_input in version 1.39.0 that should allow exactly what you’re looking for. (The bad news being this is still a few weeks away.) Hope this helps!

Marcello · September 9, 2024, 11:34pm

Thanks a lot Nico! it is definitely a good news. It is fine to wait Do you think that is definitely an issue of audio_recorder_streamlit component and whisper then? in this tutorial here: Youtube tutorial personal assistant at minute: 8:56 it looks fine, very strange. Thanks again!

snehankekre · September 10, 2024, 2:17am

Hey @Marcello

Given the error you’ve shared, I suspect it’s neither an issue with the custom component nor whisper. It seems like there’s an issue with reading in the audio file, which might indicate a problem with how the audio bytes data from the custom component was encoded and saved to a file.

As a workaround (I wouldn’t recommend using this in prod) you could give st.audio_input a try by installing the whl from the PR currently undergoing code review:

pip install https://core-previews.s3-us-west-2.amazonaws.com/pr-9404/streamlit-1.38.0-py2.py3-none-any.whl

Here’s a sample script that takes audio data returned by st.audio_input, saves it as a NamedTemporaryFile, and passes that file to the OpenAI translations API:

import tempfile
from openai import OpenAI
import streamlit as st
from dotenv import load_dotenv
import os

load_dotenv()

client = OpenAI(api_key=os.getenv(“OPEN_AI_API”))

audio = st.audio_input(label="Record some audio to translate to English using OpenAI")

if audio:
    with tempfile.NamedTemporaryFile(delete=True, suffix=".wav") as f:
        # The translation API requires a file, 
        # so we write the audio data to a temporary file
        f.write(audio.getvalue())
        # open and translate the file
        audio_file = open(f.name, "rb")
        translation = client.audio.translations.create(
        model="whisper-1", 
        file=audio_file,
        prompt="Translate the following audio to English. First determine the language of the audio and then translate it to English.",
        )
        st.write(translation.text)
        f.close()

Marcello · September 17, 2024, 3:09pm

Thanks a lot, sorry for my late answer. Marcello

Topic		Replies	Views
File uploading and reading using st.file_uploader Using Streamlit	9	16047	August 8, 2024
OpenAI Whisper and GPU Community Cloud	2	3002	August 29, 2023
OpenAI Assistants API Streaming Using Streamlit debugging	4	2870	March 14, 2024
New App : Speak With CSV File - Stremlit Audio Input Show the Community! pandas , streamlit-cloud , llms , openai , build-with-streamlit	7	380	May 12, 2025
Open ai api use problem LLMs and AI file-upload , streamlit-cloud	5	1674	July 31, 2023

Open ai whisper doesn't recognize audio file recorded with audio_recorder_streamlit component

Record and transcribe voice input

play the audio and transcribe

openai_integration function to transcribe the text

Load the environment variables

Initialize the OpenAI API client

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies