Running multiple apps and crashing the memory due to having large NLP models

I have a few apps that I wish to run locally. Suppose they are called app1.py, app2.py and app3.py. Each of these app the have their own local directory and if I run them one by one they will be running on distinct ports, namely http://localhost:8501/, http://localhost:8502/ and http://localhost:8503/.

Each app uses a different NLP model, for instance in app1.py I have:

def load_model():
    model = BertForSequenceClassification.from_pretrained('ProsusAI/finbert')
    return model

def load_tokenizer():
    tokenizer = BertTokenizer.from_pretrained('ProsusAI/finbert')
    return tokenizer

model = load_model()
tokenizer = load_tokenizer()

while in app2.py I have:

def load_model():

    model = AutoModelWithLMHead.from_pretrained('t5-base', return_dict=True)

    return model

def load_tokenizer():

    tokenizer = AutoTokenizer.from_pretrained('t5-base')

    return tokenizer

model = load_model()

tokenizer = load_tokenizer()

These models can be quite large, the issue is when I run multiple server beyond the second app the memory crashes. I wonder if there is a better way to run multiple apps? and not load model directly from the app but rather have some sort of backend to call the models using API?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.