Caching a function only for specific inputs

Hi, I have a function that returns a Plotly chart. My interface has a number of sliders that influence the chart. I want to cache the fig objects that are returned by this function, but only for the default input values that everyone sees. The charts generated by those default inputs are used every time a new user loads the page, and caching those charts should be a huge help.

If I just add the @cache_data() decorator to the function, it tries to cache every version of the chart that anyone makes:

@cache_data(ttl=180)
def get_plot(nl_config, df):
    """Generate the cost chart."""
    fig = go.Figure()
    ...

It doesn’t really make sense to cache every version of the plot that’s generated. Is there a straightforward way to cache only for specific input values? I think I could set up a few levels of inspection, ie make get_plot() inspect the input values, then call out to get_plot_cached() for the most common input values, and get_plot_uncached() for other values.

Is that the best approach, or is there a more standard way to do this?

GitHub repo

Community Cloud deployed app

Thanks.

Why is that? Do you experience a slowdown?

The cache is designed to handle that.

My app is deployed to Community Cloud, so I’m not aware of any metrics that really give a sense of how close the app is to maxing out resources. It’s either working or it’s not. I shared it with one group and it got a moderate amount of attention, and handled that fine. I shared it with a larger group and it started returning 429s,

It doesn’t really make sense to cache every version of the plot that’s generated.

There are enough inputs that once people start moving the sliders, the chance of the plot they end up with matching someone else’s exactly is pretty small, so caching that plot seems unlikely to have any benefit.

The cache is designed to handle that.

Will the caching system recognize which versions of a plot are used most commonly and keep those around, even if those common plots are momentarily being pushed out of the cache by less commonly used plots?

Caching is used when the cache contains the same function parameter values as it’s seen before. Are you wanting to avoid caching altogether for user inputs to the function, or just want to make sure a previous cache is never used for user inputs to the function? If it’s the former, I’d use the approach you mentioned. If it’s the latter, I’d include some random variable that’s sent to the cache when a user triggers the function, along with its other inputs, but a static value the first time the page is loaded. Probably the best way would be to use session state to hole this value, which is set to the static value on page load, then assigned a random value when a user triggers the function themselves.

The app lets users set three variables: number of subscribers, paid ratio, and average revenue.

I have read the advice to minimize the number of inputs, and I’ll do a little more work on that end but the matrix of possible inputs will still be fairly large. I think that’s balanced by having a fairly small dataset; there’s a few thousand data points on the entire page.

Everyone who visits the site sees the charts in their initial state. I definitely want those charts cached. A significant number of people will max the first slider, so I want to cache the charts where that slider is maxed, and the others are at their default settings.

I’m not sure there’s value in caching all the other versions of the charts. Imagine ten people visit the site. They all see the cached initial page. They all play around with the sliders a bit. If each of the charts they see are cached, that’s almost certainly wasted caching work because no one is likely to use exactly the same inputs. And, if those caches push out the earlier caches of the charts that everyone sees, I lose the benefit of those caches.

I think I should probably set up a VPS staging server to deploy to, and watch the impact of doing all this.

Does the caching system weight each entry in the cache? For example does it track how many times a cached resource is used, and keep the ones around that are used often? Or is it just a FIFO queue?

That has nothing to do with the cache. The server just experienced too many requests that its request rate limit had been exceeded.

cache_data has the ttl parameter (which you already used) that controls which data to retain by time. The eviction policy is to dicard those that goes beyond ttl. Yet another parameter is the max_entries which evicts old entries in favor of new one. So those current users who explore the app will benefit more from the speed provided by the caching mechanism.

Caching is fun, see the reference for more details.

Are you saying there’s a hard rate limit, regardless of how efficiently an app can serve each request?

I believe so. The reason is possible abuse to resources. Initially our app is deployed in a limited container, etc. If you need more ask/communicate with the host. The host also needs to protect their resources. If there are more users, it is probably better to monetize it. Increase the limits by paying the hosts or changing the plan. Users can be happy, hosts are fine, developers are happy.

Hello @ehmatthes,

  1. First, create a function specifically for generating plots with the default inputs
import streamlit as st
import plotly.graph_objects as go

@st.cache
def get_plot_cached(nl_config_default, df_default):
    """Generate the cost chart for default inputs, and cache this version."""
    fig = go.Figure()
    # Your plot generation logic here
    return fig
  1. Then, in your main function that generates plots, implement the logic to decide when to call the cached version vs. generating a new plot.
def get_plot(nl_config, df):
    nl_config_default = {...}
    df_default = {...}

    if nl_config == nl_config_default and df.equals(df_default):
        return get_plot_cached(nl_config, df)
    else:
        fig = go.Figure()
        return fig

Hope this helps!

Kind Regards,
Sahir Maharaj
Data Scientist | AI Engineer

P.S. Lets connect on LinkedIn!

➤ Want me to build your solution? Lets chat about how I can assist!
➤ Join my Medium community of 30k readers! Sharing my knowledge about data science and AI
➤ Website: https://sahirmaharaj.com
➤ Email: sahir@sahirmaharaj.com
➤ 100+ FREE Power BI Themes: Download Now

2 Likes