How to include en_core_web_sm-2.2.0 in deployment?

Hi everyone,

Firstly, I have been overjoyed using Streamlit, and I really like the service. Everything except for this one small thing has been a breeze.

I am building a small app to use for an introductory course to corpus linguistics. Some of the functions I want to showcase are dependant on the “en_core_web_sm-2.2.0” for SpaCy. The repository can be found here:

Whenever these functions are called, the app will crash and return this error:

OSError: [E050] Can’t find model ‘en_core_web_sm’. It doesn’t seem to be a Python package or a valid path to a data directory.

Including the pip import line in the requirements.txt results in this error:

ERROR: Invalid requirement: ‘�\x08\x08���]\x02�dist/en_core_web_sm-2.2.0.tar\x00�|gXT˲(JRD\x04\x11AA\x01E\x01�\x19fV�d��1!q@\x04f�’ (from line 1 of https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-2.2.0/en_core_web_sm-2.2.0.tar.gz)

Where exactly am I supposed to instruct the Streamlit Cloud to install this model? I have found previous mentions of a setup.sh file concerning Heroku deployment. If that is what I am supposed to do, where can I find the Streamlit Cloud documentation for it?

Best,
Daniel

EDIT: I forgot to include the app url: https://firstglance.streamlit.app/
Sorry!

It is explained in the github site. It should be something like this

python -m spacy download en_core_web_sm 

You can do this in the app, using subprocess.run() in a cached function so that it is only downloaded the first time.

import subprocess

@st.cache_resource
def download_en_core_web_sm():
    subprocess.run(["python", "-m", "spacy", "download", "en_core_web_sm"])

Then call the function at the beginning of your app. I didn’t test this specifcally but I have solved problems like this in a similar way. Make sure you read and understand the documentation.

1 Like

Can be done with the requirements.txt file, but the versions have to match:

spacy==3.5.0
https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.5.0/en_core_web_sm-3.5.0-py3-none-any.whl

See my improvements in the fork of your project:

GitHub - Franky1/firstglancecorpustools: Streamlit app forked for debugging purposes

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.