Loading pretrained weights worked fine at local but keep throwing error when I deploy it

I deployed an app and it works perfectly at the first time.

log when I used the function for the first time

loading weights file pretrained_mask_ins0209/pytorch_model.bin
All model checkpoint weights were used when initializing MaskClassifier.

However, when I try using it for the second time it fails to load pretrained weights.

log when I used the function for the second time

loading weights file pretrained_mask_ins0209/pytorch_model.bin
[manager] Error checking Streamlit healthz: Get “http://localhost:8501/healthz”: dial tcp 127.0.0.1:8501: connect: connection refused

But this same file works fine when running at local. Can you give me any advice?

Are you deploying this on Streamlit sharing or a different platform?

on Streamlit sharing

Because you are using Torch and I assume a fairly large model, what is likely happening is that you are running out of RAM. This will crash the container the app is running in, and the browser will try to connect to the healthz endpoint but not get a response.

There isn’t a great solution right now, as we try and understand users usage patterns on Streamlit sharing during this beta period. It’s likely the case that your app needs 2-4x the resources we allocate, and maybe even more. That’s hard for us to sustain as a free service, but we’re trying to figure out ways to make this use case work in a cost-effective manner for us.

Best,
Randy

1 Like

Thanks

1 Like