I am creating a streamlit app that my non-technical colleagues can use to upload an audio file locally, which I will subsequently transcribe using Open AI’s Whisper.
I am having problems using st.file_uploader. After loading the Whisper model, I’ve tried using:
but I keep getting TypeError and UnicodeDecodeError errors.
I looked up the documentation on st.file_uploader but there is little information available. I can’t find much on the UploadedFile object as well to know if there are other methods. Or is it because Whisper doesn’t work if a file is uploaded?
Would appreciate some help if it is indeed possible to upload a file for Whisper to use.
Due to legal and ethics constraints, we cannot upload our audio files online. Can I confirm that st.file_uploader only hold the file in the user’s computer ram only?
The issue is that
model.transcribe is expecting either a file name string, or a numpy array, or a Tensor, and the UploadedFile is none of these. (see whisper/transcribe.py at main · openai/whisper · GitHub for more details)
The easiest way to solve this is to save the uploaded file to a temporary file with a known path.
from tempfile import NamedTemporaryFile
import streamlit as st
audio = st.file_uploader("Upload an audio file", type=["mp3"])
if audio is not None:
with NamedTemporaryFile(suffix="mp3") as temp:
model = whisper.load_model("base")
result = model.transcribe(temp.name)
This seems to work well. I can’t speak to the legal constraints, but if you are running this app on your local machine, then it won’t go anywhere else when you upload it. If you are running your app on a remote server, this method will certainly (at least temporarily) put the file on the server’s disk.
Thank you. I didn’t realise I can write to a temp file with streamlit. I successfully wrote
audio.getvalue() to a temp file and managed to display it using
I still got a “permission denied” error nonetheless but I’m pretty confident this is caused by ffmpeg rather than temp files or Whisper. I’ll mark your response the solution as my question is fundamentally about using streamlit.