Best practices for fetching artifacts requried for the app?

Hi,

Thank you for the beta testing invitation. I would like to deploy an app that uses a relatively large PyTorch model (around 500Mb, should still fit the constraints). I would like to store the model artifact on one of the publicly available hostings.

Is there any way to fetch the artifacts before launching the app, such as a setup hook? If not, what is the recommended way?

Thank you,
Denis

Answering my own question - ended up fetching and caching the model binary on the local disk. Just a function from within the Streamlit app. In case if the model file is not found, the function re-downloads it from Dropbox.

Please let me know if there is a better way.

Hi @snexus, welcome to the community !

I don’t think there is a better way for now than fetching the binary on app startup for now.

I’m also not sure if the downloaded binary stored in the shared environment is then available to other users so it’s not redownloaded and I’m kind of interested by the answer to this :slight_smile:.

@amey-st is there something planned for using shared large media in Streamlit Sharing ?

Best,
Fanilo

Hi Fanilo, thanks for your answer.

I’m also not sure if the downloaded binary stored in the shared environment is then available to other users so it’s not redownloaded and I’m kind of interested by the answer to this :slight_smile:

Can confirm the binary is cached and isn’t re-downloaded. I tried from different devices and with/without VPN.

Regards,
Denis

Hi @andfanilo, thanks for looping me in! A related feature that’s on the roadmap is the support for Git LFS, which once available, could be used to store the datasets or model file artifacts seamlessly on Github servers, so that the app developers would not have to worry about fetching the data from public S3 buckets or Dropbox.

Cheers,
Amey

1 Like

@amey-st is this feature available now? In my app, I am using a pytorch model file which was 196 Mb large, so had uploaded it on GIitHub using Git LFS. However, whenever I try to run, I get the following error

I am guessing that is because the model could not get the weights file from the repository as it would also have to perform a git lfs pull . When would this feature be available?