Should I cache variables outside the main program?

Okay, I clearly understand the overall concept of Streamlit framework: any time something must be updated on the screen, Streamlit reruns your entire Python script from top to bottom.

I also absolutely understand the idea of @st.cache decorator: it allows developers to skip certain costly computations when their apps rerun.

All together these two statements mean that if let’s say I have such a piece of code:

import streamlit as st
from tensorflow import some_heavy_neural_network_model

@st.cache
def load_model():
   some_parameters=[1,2,3]
   model =  some_heavy_neural_network_model(some_parameters=some_parameters)
   return model

model = load_model()

image_to_process = st.file_uploader("Upload an image", type=['jpg'])
image_result = model.predict(image_to_process)

st.image(image_result)

Here I have a web page where I can upload an image and then see the result of my heavy Neural Network processing. Each time I upload a new image, the program will inference it through my heavy NN model and display the result. But it won’t reload the heavy NN model each time, because I use @st.cache, so everything is absolutely logical and clear.

But what if I separate the web page itself and model inference functionality into two different modules? Let’s say I rewrite the above code into two separate pieces:

main.py

import streamlit as st
import my_custom_module as ncm

image_to_process = st.file_uploader("Upload an image", type=['jpg'])
image_result = ncm.process_image(image_to_process=image_to_process)

st.image(image_result)

my_custom_module.py

from tensorflow import some_heavy_neural_network_model

some_parameters=[1,2,3]
model = some_heavy_neural_network_model(some_parameters=some_parameters)

def process_image(image_to_process, model=model)
    image_result = model.predict(image_to_process)
    return image_result 

Here we can see the code that does the pretty same thing as the code from my first example, the only difference that now the functionality is divided into two separate files. When we run the code, main.py should import my_custom_module.py which means that heavy model will be uploaded to memory once when the code starts. After that each rerunning will call the process_image function from my_custom_module.py, but does it mean that the heavy model will be reloaded as well?

In a nutshell: Should I cache every variable I declare outside the main module? Or the are saved only once to memory anyway and any info outside the function I call won’t be reloaded?

If any declared variables outside the main module indeed will be saved in memory, why than it’s not revealed in any official documentation? I mean it seems to me like a pretty good alternative to @st.cache, isn’t it? Are there any disadvantages comparing @st.cache or what?

The Streamlit documentation (overall very detailed and informative, thank you for that!) has surprisingly little information on this topic. I mean it looks like everyone just uses Streamlit apps just inside one single .py file. I couldn’t any information about importing custom modules and the way they work in in the full Streamlit pipeline, though I tries hard! It would be great to have some official comment here. Thanks!

I hope you already figured out the answer to that by yourself. In case you didn’t, do it now.

# datastore.py
import time


def get_data():
    time.sleep(5)
    return "DATA"
    
data = get_data()
# app.py
import streamlit as st

import datastore


data = datastore.data
st.metric("Data", data)
st.button("Rerun")

Using @cached functions you can pass parameters, have hits and misses automatically managed by the framework, specify time to live and maximum number of entries and have warnings when you do potentially dangerous things. Which might be relevant for your use case or not.