How to install language model for spaCy in requirements.yml

I am learning how to use streamlit and spaCy. How do I write the equivalent of

python -m spacy download en_core_web_lg

in requirements.yml?

The documentation said to include a requirements.yml or requirements.txt so I tried creating those files by exporting the environment. I ended up taking hours trying out different variations of pip and conda commands before I realised I should write a simplified version manually.

While I was able to figure this out eventually, I cannot figure out how to resolve the installation of language model in a yaml file. There is no documentation I can google to describe how to install a language model. Would appreciate some help on what commands to use and where exactly to put it.

Hi @darylkdps, welcome to the community! :wave:

Perhaps this past thread is informative:

Your suggestion works. Thanks.

Turns out it was a problem with environment.yml. I had no problem when I eventually tried using requirements.txt.

I kept getting “Error during processing dependencies!” when I use yml. Morever, the whole “baking” process takes extremely long so the trial and error process took a few hours. I was hoping to get at least some message telling me what went wrong but the only meaningful message I got was something about “bash: line 3: 15 Killed /home/appuser/.conda/bin/conda env update -n base --file environment.yml” and “installer returned a non-zero exit code”. I formatted my yml according to https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#creating-an-environment-file-manually so I couldn’t figure out how or why it is causing errors.

1 Like

Thanks for the feedback about the lack of informative error messages when installing dependencies via environment.yml. I’ll pass it along internally. :balloon:

The error with your environment.yml had to do with including the spacy model url as a dependency. The spacy docs indicate the url must be included in requirements.txt instead. There’s no mention of a requirements.yml in their docs.

Thanks for clarifying. I was looking at Streamlit’s documentation and didnt know spaCy’s documentation takes precedence. Just stating this here explicitly to help someone else who might run into the same problem.

Also, my initial requirements.yml actually had the following without the spacy model url. It failed even with this:

name: xxx
channels:
  - plotly
  - conda-forge
  - defaults
dependencies:
  - matplotlib=3.6.0
  - numpy=1.23.3
  - pandas=1.4.4
  - plotly=5.10.0
  - python-docx=0.8.11
  - scikit-learn=1.1.2
  - scipy=1.9.1
  - seaborn=0.12.0
  - spacy=3.4.1
  - streamlit=1.12.2
  - tqdm=4.64.1
  - wasabi=0.9.1
1 Like

spaCy’s docs take precedence when it comes to model installation instructions. Your initial file failed because it was called requirements.txt but the contents were what’s supposed to be in environment.yml.

After you renamed the file and included only the above, without the spacy model url, it still failed. I tried deploying an app with the exact contents you’ve shared. It took ages for the “baking” process to complete. Once it was done, I got the identical, non-descriptive error message you did :confused:

Definitely not an ideal developer experience. I will share your feedback with the team. Thanks for your patience and taking time to explain the issue to us :balloon: