[Teams] Issue with deploying PyTorch/NLTK app on Streamlit Teams

Hi guys,

I’ve been trying Streamlit Teams for a few days now and so far I’m really blown away by it!

I also ported part of the code from the article below to Streamlit (The FAQ generation, not the Schema bit):

It works well locally yet I can’t seem to be able to deploy in Streamlit Teams. The app remains frozen at the interstitial:

image

Looking at Teams’ beta limitations, and as the script relies on large libs (e.g. PyTorch), this may be expected.

I cannot check the logs so I was wondering whether you guys might be able to confirm?

Here’s the app URL: https://s4a.streamlit.io/charlywargnier/t5faqstreamlit/master/app.py/+/?logs=1

Thank you.
Charly

Hi @Charly_Wargnier! I think your app was affected by a known bug on our side, which sometimes causes the app screen to be stuck in the “oven” interstitial. Can you try accessing the app now?

Do let me know if you are still running into issues, thanks!

Thanks!
Amey

1 Like

BTW from what I can see, the app is reporting a ModuleNotFoundError: No module named 'requests_html'. The logs in the terminal might provide a hint why this is happening.

1 Like

Thanks for the prompt heads-up Amey!

There was a conflict in the requirements.txt file, 2 html_request

I’ve alternated both, yet always the same error - I’ve pasted the log in here

In the meantime, I’ll try with various versions later - hopefully it may fix the issue!

Thanks,
Charly

Thanks for providing the logs!

Is this the culprit:

ERROR: Could not find a version that satisfies the requirement pywin32==228 (from -r requirements.txt (line 77)) (from versions: none)
ERROR: No matching distribution found for pywin32==228 (from -r requirements.txt (line 77))

On my Mac:

$ pip install pywin32
ERROR: Could not find a version that satisfies the requirement pywin32 (from versions: none)
ERROR: No matching distribution found for pywin32

Thanks Amey!

I’ve simply removed the pywin line, it works!

Another issue is now with NLTK - see error log below.

I’ve downloaded these files manually on my local machine, I just need to find a way to add them via pip.

I’ve got some time this morning so I’ll dig into that - will feedback here :slight_smile:

Thanks,
Charly

Hi Amey,

I managed to make it work, not via pip yet by adding the following lines in app.py:

import nltk
nltk.download('punkt')

The app seems to be running smoothly with various URLs as long as the scraped content is not too large. If too much content to analyze, the app is crashing and restarting.

You can try by yourself:

Log following the crash: https://pastebin.com/fGLpwrAg

It’s not clear to me yet if why it is crashing.

Thanks,
Charly

Ahah I love it’s take on Twitter! :stuck_out_tongue:

That’s a great app, need to dig into it!

1 Like

Thanks @andfanilo! Although I didn’t do much here aside from adding code blocks together! :smiley:

Ah yeah looking into the Colab, it scrapes the HTML but if JS generates the page you’re out of luck.

The eternal problem of Web Scraping :stuck_out_tongue: maybe scraping through Selenium could help build the full page :).

1 Like

Excellent idea! :raised_hands:

It looks like the app is running out of memory on the platform. Do you have a heuristic on how frequently this would happen for the typical use case of the app?

1 Like

Thanks Amey.

Large content like this one would be uploaded most of the time, thus having these crashes would make the app pretty much unsuable.

What do you think could be done to mitigate this? Are there any ways to increase the memory maybe?

Lastly, do you reckon it may be something to do with Teams or something that would need to be tweaked in one the the libraries (e.g. NLTK)?

Thanks,
Charly

Hi Charly! I increased the memory limits on your app to a higher value and was able to see the example you gave run successfully. LMK if you run into any other issues with this or any other app.

Cheers!
Amey

2 Likes

That’s great Amey, thank you!

I’ve tried with the URL that failed before and I’ve still got a crash. Here’s the URL:

I’ve also pasted the latest log FYI.

I’ll try with smaller pieces of content later on tonight. I guess we could always add a caveat to the app - stating that it can analyze up to N characters. :slight_smile:

Thanks,
Charly

Hi Charly, nice work on the app.
Your hack to getting the nltk work by downloading the dependencies in the main file might increase the latency.

The app will attempt to download this file for every instance, which is not effective.
I also have this issue in my app with 4 nltk dependencies. including this in my main file will definitely increase latency.

It will be great if streamlit team can provision for a special requirement file for nltk dependencies that cannot be downloaded using PIP

1 Like

Thanks for your kind words Bamigbade!

So would you suggest e.g. to place this code:

import nltk
nltk.download('punkt')

… in a separate file and call it as a function in the master file (app.py) to speed things up?

Thanks,
Charly

Not at all @Charly_Wargnier

PIP use requirements.txt for dependencies installations
a special dependency file e.g nltk.txt if provided by streamlit team can used by nltk.download during deployment so that these files don’t get downloaded for every instance of the app

Cheers!
Yhemmy

Makes sense - Thanks for clarifying! :slight_smile:

1 Like