Hey! First, thanks for the great work on this ā I love the Python-first approach and the thoughtful API.
Iāve been trying to integrate spaCy, especially the built-in visualizers (see /usage/visualizers
ā Iām only allowed to post 2 links here). Hereās my progress so far and Iāve managed to build an interactive app that loads a pre-trained model, processes a given text and generates different types of visualizations
The only thing Iām still not really sure about is how to efficiently cache the loaded model (nlp
) and the processed doc
. At the moment, Iām setting ignore_hash=True
, but Iām worried that this has unintended side-effects? I still occasionally see a āmuted argumentsā warning.
In spaCy, the nlp
object holds the loaded model weights, word vectors, vocabulary and so on ā but itās also mutable. Ideally you only want to be creating it once and then pass it around. (If Iām writing a REST API, Iād typically load all models once and store them in a global dict.) Same with the doc
object: for each text the user enters, Iād ideally want to create the object only once.
Whatās the best way to solve this? Maybe thereās also something obvious Iām missing here ā I really only just got started