NLTK dependencies

Hi guys
I have been trying streamlit teams on a couple of apps that I have on my public Git repo. This is mind-blowing to be sincere.

app1: https://s4a.streamlit.io/opeyemibami/wine-quality-prediction-web-app/master/app.py/+/

app2 with nltk dependencies error: https://s4a.streamlit.io/opeyemibami/topic-modelling-open-source-tool/master/app.py/+/

The only issue I have encountered so far is NLTK dependencies downloads that PIP cannot handle.
the app rely on some NLTK dependencies such as stopwords wordnet pros_cons reuters. which pip cannot download.

While deploying to heroku, these dependencies were solved by listing in a nltk.txt file.
but seems not to be working with streamlit team.
Is there any special requirements file for nltk dependencies with streamlit team ???

1 Like

Hi @Bamigbade_Opeyemi! Glad to hear that you found the deployment platform useful!

Re NLTK - a workaround is to use nltk.download(...) to download the specific dataset you are looking for (example 1, example 2). Let me know if it does not work for you for some reason.

Also it’s a good suggestion to support a nltk.txt file in the app repository alongside requirements.txt. I’ll create a feature request internally to keep track of it.

Cheers,
Amey

2 Likes

Thanks for your response @amey-st

nltk.download(...) on the main file might be efficient for now because, for every instance, the files will be downloaded which will lead to increase latency.

I will just wait for the nltk.txt feature inclusion.

Cheers,
Yhemmy

1 Like

Re latency - I think it’s inevitable that each app instance will have to download the dataset, even if we supported nltk.txt. The latency cost is in starting up an app instance. Once the dataset has been downloaded by an instance, it should be available locally for the lifetime of that instance.

1 Like

Thanks @amey-st
I agree with you. This is good enough for prototyping.