Streamlit app AudioSegment pydub deployment/debug issue

mo1337 · March 23, 2024, 10:52pm

I am trying to build an app that allows for user input of an audio file that is then batch-processed into smaller chunks and summarized through a machine learning process. Everything works as expected on the localhost test run but when brought into the streamlit cloud I get an error that seems to callout an issue with the pydub library and using AudioSegment.from_mp3 line of code, and therefore the app does not run on the streamlit cloud fully functional.

If you’re creating a debugging post, please include the following info:

Are you running your app locally or is it deployed? → deployment
If your app is deployed:
a. Is it deployed on Community Cloud or another hosting platform?
b. Share the link to the public deployed app. → https://speechsummarizer.streamlit.app/
Link to repo requirements file found here → (including a requirements file).
Share the full text of the error message (not a screenshot).
Share the Streamlit and Python versions. → venv created using python3.9

mo1337 · March 25, 2024, 7:16am

I seemed to be able to get over this hurdle by including a packages.txt file and including it as part of the repo in the main file where the main script is being run so that the streamlit app automatically installs on the linux environ. Apparently this is necessary for outside app dependencies.

But now I am running across an issue with utilizing the Vosk pre-trained voice model.

Error:

LOG (VoskAPI:ReadDataFiles():model.cc:308) Loading winfo /home/appuser/.cache/vosk/vosk-model-en-us-0.22-lgraph/graph/phones/word_boundary.int

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).

Using a pipeline without specifying a model name and revision in production is not recommended.

LOG (VoskAPI:ReadDataFiles():model.cc:213) Decoding params beam=13 max-active=7000 lattice-beam=6

LOG (VoskAPI:ReadDataFiles():model.cc:216) Silence phones 1:2:3:4:5:11:12:13:14:15

LOG (VoskAPI:RemoveOrphanNodes():nnet-nnet.cc:948) Removed 0 orphan nodes.

LOG (VoskAPI:RemoveOrphanComponents():nnet-nnet.cc:847) Removing 0 orphan components.

LOG (VoskAPI:ReadDataFiles():model.cc:248) Loading i-vector extractor from /home/appuser/.cache/vosk/vosk-model-en-us-0.22-lgraph/ivector/final.ie

LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:183) Computing derived variables for iVector extractor

LOG (VoskAPI:ComputeDerivedVars():ivector-extractor.cc:204) Done.

LOG (VoskAPI:ReadDataFiles():model.cc:282) Loading HCL and G from /home/appuser/.cache/vosk/vosk-model-en-us-0.22-lgraph/graph/HCLr.fst /home/appuser/.cache/vosk/vosk-model-en-us-0.22-lgraph/graph/Gr.fst

LOG (VoskAPI:ReadDataFiles():model.cc:308) Loading winfo /home/appuser/.cache/vosk/vosk-model-en-us-0.22-lgraph/graph/phones/word_boundary.int

No model was supplied, defaulted to sshleifer/distilbart-cnn-12-6 and revision a4f8f3e (https://huggingface.co/sshleifer/distilbart-cnn-12-6).

Using a pipeline without specifying a model name and revision in production is not recommended.

[07:08:40] ❗️ The service has encountered an error while checking the health of the Streamlit app: Get "http://localhost:8501/script-health-check": dial tcp 10.12.161.131:8501: connect: connection refused

[07:10:11] ❗️ Streamlit server consistently failed status checks

[07:10:11] ❗️ Please fix the errors, push an update to the git repo, or reboot the app.

Goyo · March 25, 2024, 7:39am

This may be your app using more than 1 GB RAM.

mo1337 · March 25, 2024, 9:39am

Hello,

I could either call the model from the Vosk server or I could download the trained model, unzip it, store it on the streamlit server in memory and call upon it. Both approaches seem to provide me this error at the model step. Are there cache techniques I could use or performance tweaks I can do to be able to get this to work?

I may have to look at smaller (less accurate ML models) other places to store the model and call it if this is the issue.

Goyo · March 25, 2024, 1:02pm

That is pretty much it, I think. Maybe your model and your data are just too big for Streamlit Cloud.

mo1337 · March 25, 2024, 1:33pm

Hi Goya,

I was able to resolve it with some modifications.

I wonder, is there a way to check which sections of the code or model ends up utilizing the 1GB memory during a run and how it works with multiple users using it at the same time?

Is each user sharing the 1GB at any given moment?

Thanks

Goyo · March 25, 2024, 2:08pm

You can profile your app locally. I am not aware of a way of doing it in streamlit cloud.

All users of an application are running the same application in the same machine so they are sharing RAM, CPU, disk storage, etc.

system · September 21, 2024, 2:09pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How To save file_uploaded .mp3 and wav files using streamlit? Using Streamlit	17	8937	February 19, 2024
Sudenly Streamlit Cloud App Failing with Segmentation Fault (Ran Fine Until 2h Ago) Community Cloud streamlit-cloud , debugging	0	123	January 10, 2025
Normalize audio using pydub and load it on streamlit audio player Using Streamlit	2	2122	August 16, 2021
Error when using st.audio_input for large files Using Streamlit debugging	1	111	May 5, 2025
Permission denied for /tmp directory using ffmpeg Community Cloud streamlit-cloud	6	1603	March 18, 2023

Streamlit app AudioSegment pydub deployment/debug issue

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies