Hello community!
I’m facing an issue when using the streamlit_mic_recorder
function together with librosa
to transcribe audio recordings in a Streamlit-based project. The basic flow is as follows:
- I record audio from a microphone using
streamlit_mic_recorder
. - After the recording is completed, I try to process the audio with
librosa
to convert it into an audio array and then transcribe it using a speech recognition model (Whisper).
Here is the basic code for recording and processing:
import torch
from transformers import pipeline
import librosa
from io import BytesIO
import numpy as np
from pydub import AudioSegment
from SystemResources.GestorIADashBoard.ModuloBimochat.Recursos.utils import load_config
config = load_config()
def convert_bytes_to_array(audio_bytes):
audio_bytes = BytesIO(audio_bytes)
audio, sample_rate = librosa.load(audio_bytes)
print(sample_rate)
return audio
def transcribe_audio(audio_bytes):
device = "cpu"
pipe = pipeline(
task="automatic-speech-recognition",
model=config["whisper_model"],
chunk_length_s=30,
device=device,
)
audio_array = convert_bytes_to_array(audio_bytes)
print(f"Audio array size: {audio_array.shape}")
print(f"Model vocabulary size: {pipe.model.config.vocab_size}")
print(f"Suppress tokens before sanitization: {pipe.model.config.suppress_tokens}")
pipe.model.config.suppress_tokens = [
token for token in pipe.model.config.suppress_tokens if token < pipe.model.config.vocab_size
]
print(f"Suppress tokens after sanitization: {pipe.model.config.suppress_tokens}")
prediction = pipe(audio_array, batch_size=1)["text"]
print(prediction)
return prediction
The error I get is as follows:
Error Analysis:
The error is coming from the librosa
library, specifically from the librosa.load
function. The message indicates an issue with opening the audio file, stating that the format is not recognized. This happens because librosa
is trying to load a BytesIO
object as an audio file directly, but the internal structure is not compatible with the formats that librosa
supports (e.g., WAV, MP3, etc.).
Possible Solutions:
- Check the audio format:
It is possible that the audio format recorded bystreamlit_mic_recorder
is not compatible withlibrosa
. You could try converting the recorded audio to a supported format such as WAV or PCM before passing it tolibrosa
.Here is an example of how to do this usingpydub
:
from io import BytesIO
def convert_bytes_to_wav(audio_bytes):
audio = AudioSegment.from_file(BytesIO(audio_bytes))
wav_io = BytesIO()
audio.export(wav_io, format="wav")
wav_io.seek(0)
return wav_io
Then, you can modify the convert_bytes_to_array
function to use this conversion:
def convert_bytes_to_array(audio_bytes):
audio_wav = convert_bytes_to_wav(audio_bytes)
audio, sample_rate = librosa.load(audio_wav)
print(sample_rate)
return audio
This error occurs due to a format incompatibility when loading the audio file into librosa
. By converting the audio to a valid format such as WAV or PCM, it should be possible to load and process the file correctly. If the issue persists, check the dependencies and the state of the audio data before passing it to librosa
.
I hope this solution is helpful! Has anyone else encountered similar issues when using streamlit_mic_recorder
and librosa
?