How to include en_core_web_sm-2.2.0 in deployment?

Hi everyone,

Firstly, I have been overjoyed using Streamlit, and I really like the service. Everything except for this one small thing has been a breeze.

I am building a small app to use for an introductory course to corpus linguistics. Some of the functions I want to showcase are dependant on the “en_core_web_sm-2.2.0” for SpaCy. The repository can be found here:

Whenever these functions are called, the app will crash and return this error:

OSError: [E050] Can’t find model ‘en_core_web_sm’. It doesn’t seem to be a Python package or a valid path to a data directory.

Including the pip import line in the requirements.txt results in this error:

ERROR: Invalid requirement: ‘�\x08\x08���]\x02�dist/en_core_web_sm-2.2.0.tar\x00�|gXT˲(JRD\x04\x11AA\x01E\x01�\x19fV�d��1!q@\x04f�’ (from line 1 of

Where exactly am I supposed to instruct the Streamlit Cloud to install this model? I have found previous mentions of a file concerning Heroku deployment. If that is what I am supposed to do, where can I find the Streamlit Cloud documentation for it?


EDIT: I forgot to include the app url:

It is explained in the github site. It should be something like this

python -m spacy download en_core_web_sm 

You can do this in the app, using in a cached function so that it is only downloaded the first time.

import subprocess

def download_en_core_web_sm():["python", "-m", "spacy", "download", "en_core_web_sm"])

Then call the function at the beginning of your app. I didn’t test this specifcally but I have solved problems like this in a similar way. Make sure you read and understand the documentation.

1 Like

Can be done with the requirements.txt file, but the versions have to match:


See my improvements in the fork of your project:

GitHub - Franky1/firstglancecorpustools: A training app for linguistics students

1 Like