spaCy dependency parsing app - runs locally but not when deployed

Dear Streamlit team and community,

I’ve been trying to deploy a very simple app that demonstrates dependency parsing with spaCy models, which works locally with torch==2.0.1.

Since my attempts to deploy with the same version of torch failed, and I gathered that pytorch is too large to be deployed, I’ve included a CPU version of torch in requirements.txt.
The models and dependencies are installed without errors, even torch==2.0.1+cpu is installed correctly, and the only issue I can see in the logs is:
:exclamation: Streamlit server consistently failed status checks
I don’t get any other hints as to what the problem could be, apart from a warning:

warning: missing-index-doctype
× The package index page being used does not have a proper HTML doctype declaration.
╰─> Problematic URL: https://download.pytorch.org/whl/torch_stable.html
note: This is an issue with the page at the URL mentioned above.
hint: You might need to reach out to the owner of that package index, to get this fixed. See https://github.com/pypa/pip/issues/10825 for context.

Is this warning preventing my app from running? Could you suggest what the issue might be?

This is the content of the requirements.txt.

pip==23.1.2
spacy==3.4.0
pydantic==1.9.2
spacy-transformers==1.1.2
-f https://download.pytorch.org/whl/torch_stable.html
torch==2.0.1+cpu
urllib3==1.26.15
streamlit==1.23.1
https://github.com/explosion/spacy-models/releases/download/de_core_news_md-3.4.0/de_core_news_md-3.4.0.tar.gz
https://github.com/explosion/spacy-models/releases/download/de_core_news_lg-3.4.0/de_core_news_lg-3.4.0.tar.gz
https://github.com/explosion/spacy-models/releases/download/de_dep_news_trf-3.4.0/de_dep_news_trf-3.4.0.tar.gz
typing_extensions<4.6.0

And the app itself:

import time
import spacy
from spacy import displacy
import streamlit as st
from streamlit import cache_resource

# Wrap the model loading with streamlit caching
@st.cache_resource()
def load_model(model_name):
    return spacy.load(model_name)

# Loading the models
nlp_md = load_model('de_core_news_md')
nlp_lg = load_model('de_core_news_lg')
nlp_trf = load_model('de_dep_news_trf')

pipelines = {"de_core_news_md": nlp_md, "de_core_news_lg": nlp_lg, "de_dep_news_trf": nlp_trf}

# List of sentences to process
sentences = ["Ich heisse Pippi Langstrumpf.", "Ich zeichne gern, aber ich spiele nicht gern Computer.", \
'Ich mag Schokolade, aber Spaghetti und Banane mag ich nicht.', 'Ist dieser Platz noch frei?',
'Darf ich mal durch?', 'Wie spät ist es?', 'Ich habe mich verlaufen.', 'Können Sie mir bitte sagen, wie ich zum Bahnhof komme?', \
'Wie viel kostet ein Ticket bis nach Hamburg?', 'Können Sie mir bitte helfen?', \
'Ich habe mein Portemonnaie verloren.', 'Das habe ich akustisch nicht verstanden.', \
'Wann hast du morgen Zeit?', 'Können wir das auf morgen verschieben?', 'Ich bin im Stress.', \
'Ich bin gestresst.', 'Ich habe keine Zeit.', 'Das wird schon klappen!', 'Störe ich gerade?', \
'Bitte warten Sie einen Moment.', 'Einen Moment bitte.', 'Was hast du heute vor?', 'Ich melde mich.', \
'Es ist ganz schön kalt hier.']

# Adding a title and some explanations
st.title('spaCy parser comparison (German)')
st.markdown("""
Streamlit dashboard to compare the parser of three spaCy pipelines for German:
```de_core_news_md```, ```de_core_news_lg``` and ```de_dep_news_trf```.
Select a sentence from the dropdown menu or input your own sentence in the sidebar.
Dependency tree and processing time will be displayed.
""")
st.markdown("---")

# Adding a selectbox for the sentences to the sidebar
selected_sentence = st.sidebar.selectbox('Select a pre-defined sentence', sentences)

# Adding a text input for the sentences to the sidebar
user_sentence = st.sidebar.text_input('Or type your own sentence')

# Choosing which sentence to analyze
sentence_to_analyze = user_sentence if user_sentence.strip() != "" else selected_sentence


for name, nlp in pipelines.items():
    start_time = time.time()
    doc = nlp(sentence_to_analyze)
    end_time = time.time()
    elapsed_time = end_time - start_time
    svg = displacy.render(doc, style='dep')

    st.markdown(f"**Pipeline**: {name}")
    st.markdown(f"**Elapsed time**: {elapsed_time:.2f} seconds")
    st.markdown(svg, unsafe_allow_html=True)
    st.markdown("---")  # Adds a separator for readability

This is the app URL:
https://spacy-dependency-tree-german.streamlit.app/

The github repo is here:
https://github.com/emma-carballal/streamlit_parsing_app/tree/main

Thank you in advance for your help!

Try this:

requirements.txt

spacy==3.4.0
https://github.com/explosion/spacy-models/releases/download/de_core_news_md-3.4.0/de_core_news_md-3.4.0.tar.gz
https://github.com/explosion/spacy-models/releases/download/de_core_news_lg-3.4.0/de_core_news_lg-3.4.0.tar.gz
https://github.com/explosion/spacy-models/releases/download/de_dep_news_trf-3.4.0/de_dep_news_trf-3.4.0.tar.gz
streamlit==1.23.1

Thank you, @Franky1, for getting back to me on my question.
Probably you’re right and “less is more” in the requirements.txt

I started by including only the packages you suggest.
When deploying, I got an error that I was getting locally due to an incompatibility of pydantic with typing_extensions>=4.6.0, so typing_extensions<4.6.0 needs to be included in requirements.txt, as explained in https://zenodo.org/record/7970450.

Deployed again, but still getting❗️Streamlit server consistently failed status checks

Is it the fact that it’s installing a GPU version of torch the problem?

Collecting torch>=1.6.0
  Downloading torch-2.0.1-cp39-cp39-manylinux1_x86_64.whl (619.9 MB)

I don’t see any problems in the package installations. Are there any clues in the logs that could point me to a solution?
https://spacy-dependency-tree-german.streamlit.app/

Thanks again!

Hi again :smile:

Could anyone in the community who has deployed an app in Streamlit Community Cloud that uses Pytorch let me take a look at their Github repo?
I would be sooo grateful :pray: :pray:

Thank you!