I have a few apps that I wish to run locally. Suppose they are called app1.py
, app2.py
and app3.py
. Each of these app the have their own local directory and if I run them one by one they will be running on distinct ports, namely http://localhost:8501/
, http://localhost:8502/
and http://localhost:8503/
.
Each app uses a different NLP model, for instance in app1.py
I have:
def load_model():
model = BertForSequenceClassification.from_pretrained('ProsusAI/finbert')
return model
def load_tokenizer():
tokenizer = BertTokenizer.from_pretrained('ProsusAI/finbert')
return tokenizer
model = load_model()
tokenizer = load_tokenizer()
while in app2.py
I have:
def load_model():
model = AutoModelWithLMHead.from_pretrained('t5-base', return_dict=True)
return model
def load_tokenizer():
tokenizer = AutoTokenizer.from_pretrained('t5-base')
return tokenizer
model = load_model()
tokenizer = load_tokenizer()
These models can be quite large, the issue is when I run multiple server beyond the second app the memory crashes. I wonder if there is a better way to run multiple apps? and not load model directly from the app but rather have some sort of backend to call the models using API?