Hey! First, thanks for the great work on this – I love the Python-first approach and the thoughtful API.
I’ve been trying to integrate spaCy, especially the built-in visualizers (see /usage/visualizers
– I’m only allowed to post 2 links here). Here’s my progress so far and I’ve managed to build an interactive app that loads a pre-trained model, processes a given text and generates different types of visualizations
The only thing I’m still not really sure about is how to efficiently cache the loaded model (nlp
) and the processed doc
. At the moment, I’m setting ignore_hash=True
, but I’m worried that this has unintended side-effects? I still occasionally see a “muted arguments” warning.
In spaCy, the nlp
object holds the loaded model weights, word vectors, vocabulary and so on – but it’s also mutable. Ideally you only want to be creating it once and then pass it around. (If I’m writing a REST API, I’d typically load all models once and store them in a global dict.) Same with the doc
object: for each text the user enters, I’d ideally want to create the object only once.
What’s the best way to solve this? Maybe there’s also something obvious I’m missing here – I really only just got started