Streamlit and whisper - runtime error

I have tried to deploy a web-app (just a proof of concept) to take an audio file and transcribe it. It works fine on my localhost but in the cloud it runs into a run-time error when transcribing. I also built the requirements.txt from scratch with pipreq but it doesn’t change anything. This is the error:
Error: Traceback (most recent call last): File “/home/adminuser/venv/bin/whisper”, line 5, in from whisper.transcribe import cli File “/home/adminuser/venv/lib/python3.10/site-packages/whisper/init .py”, line 13, in from .model import ModelDimensions, Whisper File “/home/adminuser/venv/lib/python3.10/site-packages/whisper/model.py”, line 13, in from .transcribe import transcribe as transcribe_function File “/home/adminuser/venv/lib/python3.10/site-packages/whisper/transcribe.py”, line 20, in from .timing import add_word_timestamps File “/home/adminuser/venv/lib/python3.10/site-packages/whisper/timing.py”, line 7, in import numba File “/home/adminuser/venv/lib/python3.10/site-packages/numba/init .py”, line 55, in _ensure_critical_deps() File “/home/adminuser/venv/lib/python3.10/site-packages/numba/init .py”, line 42, in _ensure_critical_deps raise ImportError(“Numba needs NumPy 1.24 or less”) ImportError: Numba needs NumPy 1.24 or less

I have tried to explicitly include the above numPy and Numba versions but no chance.

Does anybody have a hint?
Thanks

By the way: the error does not show in the manage-app sidebar but directly in the st.write field where the transcript is supposed to be:

Hi @Awindbrake,

Thanks for posting! Can you share your requirements.txt?

You might resolve the error by specifying the correct versions:

numpy==1.24.0
numba==0.54.0

Let me know if this resolves the issue for you.

Thanks so much. I tried it but it didn’t help. This is what my requirements.txt says:
beautifulsoup4==4.11.2
docx==0.2.4
langchain==0.0.271
nltk==3.8.1
openai==0.27.6
pandas==2.0.2
pdfplumber==0.9.0
python_docx==0.8.11
rake_nltk==1.0.6
Requests==2.31.0
spacy==3.5.2
streamlit==1.25.0
numpy==1.24.0
numba==0.54.0

Can you share your code as well so I can troubleshoot it for you?

Sure. It is not much until now… just starting with the basics:
Many thanks in advance

#Define case 5
def whisper():

with st.form('audio form'):
            API = API2
            uploaded_file = st.file_uploader("Upload your audio file here...")
            
            # submit button
            submitted = st.form_submit_button("Transcribe audio")

if submitted:
    # Save the uploaded file to disk
    audio_file_path = os.path.join(os.getcwd(), "temp_audio_file")
    with open(audio_file_path, 'wb') as f:
        f.write(uploaded_file.getvalue())

    # Start the process
    process = subprocess.Popen(["whisper", audio_file_path], stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)

    # Get both stdout and stderr outputs at once
    stdout_data, stderr_data = process.communicate()
	
    # Process stdout_data to remove timestamps and display on Streamlit
    for line in stdout_data.splitlines():
        clean_line = re.sub(r'\[\d{2}:\d{2}\.\d{3} --> \d{2}:\d{2}\.\d{3}\] ', '', line)
        st.write("success")

    # Check for errors
    if process.returncode != 0:
        st.write(f"Error: {stderr_data}")

    # Optionally, delete the temporary file
    os.remove(audio_file_path)          

this is solved in the meantime. This works:

def whisper():
transcript=“”
# Define a function to preprocess and truncate the text
def preprocess_and_truncate(text, max_length=7000):
processed_text = text[:max_length] # Truncate to the specified max_length
return processed_text

if 'transcript' not in st.session_state:
    st.session_state.transcript = ""
if 'summary' not in st.session_state:
    st.session_state.summary = ""

with st.form('audio form'):
    openai.api_key = API2
    uploaded_file = st.file_uploader("Upload your audio file here (wav, mp3, mp4, m4a, mpeg, mpga): ")
    #system_message = "Act as business consultant specialized in Know-your-customer analysis and topics around German export control."
    
    # submit button
    submitted = st.form_submit_button("Transcribe audio")

if submitted:
    if uploaded_file:
        # Start the transcription process using the uploaded file
        transcription = openai.Audio.transcribe("whisper-1", uploaded_file)
        transcript = transcription['text']
        # Preprocess and truncate the transcript
        processed_transcript = preprocess_and_truncate(transcript, max_length=7000)  # Adjust the max_length as needed
        formatted_transcript = transcript.replace("\n", "<br>").replace(".", ".<br>").replace("?", "?<br>").replace("!", "!<br>")
        st.markdown("### Transcript:")
        st.write(formatted_transcript, unsafe_allow_html=True)
        #st.text_area("Transcript:", transcript, height=200)
        prompt =f"summarize this in English language in a concise way in up to 10 full sentences using bullet points. Here is the context: {processed_transcript}"
        summary = generate_text(prompt,"you are a helpful assistant",GPT_model, 0.5, 700)
        st.markdown("### Summary:")
        st.write(summary)
        # Combine the transcript and summary for download
        combined_text = f"Transcript:\n{processed_transcript}\n\nSummary:\n{summary}"

        # Add a single download button for both transcript and summary
        st.download_button('Download Transcript and Summary', combined_text, file_name='transcript_summary.txt')

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.