Caching update/reset at a particular time interval

HIMANSHU21 · April 24, 2025, 9:26am

Hi,

I am currently utilizing Streamlit to connect to serverless compute for data access. I have implemented caching to store data for a day, which functions as expected. However, the backend data is updated at 3:00 AM daily, while the Streamlit app reflects these updates much later, around 9:00 AM, due to the daily caching mechanism retaining older data.

Is there a method to maintain the daily caching while ensuring the cached data refreshes every day at 3:30 AM? I prefer not to use a manual button or option for clearing cache/updating the data.

This is code:

HIMANSHU21 · April 24, 2025, 2:20pm

Hi @blackary, Can you please help me with this?

blackary · April 30, 2025, 1:15pm

Hi @HIMANSHU21, I’m sorry, I haven’t been able to come up with a good way to get this to work. Are the queries slow enough that only caching them for an hour would lead to a poor user experience?

If so, my general recommendation would be to set up a pipeline to cache the data into a dedicated pre-processed table so that querying for the data is faster. Also, at least with snowpark, the fewer rows you return from the query, the better. So, if you’re doing something like:

Fetch 1M rows
Group them into 1k rows with Pandas

Then I would highly recommend trying to merge the grouping into the query itself, so that the query only has to create 1k objects in python, instead of 1M.

My general advice for cases where queries are the bottleneck are here: How to improve Streamlit app loading speed

HIMANSHU21 · April 30, 2025, 2:28pm

Hi @blackary ,

Thank you for your response. The query execution time is approximately 20-25 seconds. I am utilizing Databricks to manage data and tables, connecting via serverless compute, which requires 3-4 seconds to initialize. Our application relies on 5-6 different tables to generate multiple pages, each dependent on distinct datasets. Moreover, we need all the data every time because the dashboard is based on historical data. This is what I am doing:

######################################################################

#Importing require libereries
import streamlit as st
import os
from databricks import sql

#Creating a function to connect to databricks SQL warehouse with caching to 1 day
@st.cache_resource(ttl=86400)
def get_connection():

    server_hostname=os.getenv('server_hostname'),
    http_path=os.getenv('http_path'),
    access_token=os.getenv('access_token')

    if not server_hostname or not http_path or not access_token:  
        st.error("access_token or http_path or access_token is not set.")  
        return None  

    else:
        try:
            connection = sql.connect(
                server_hostname=os.getenv('server_hostname'),
                http_path=os.getenv('http_path'),
                access_token=os.getenv('access_token')
            )
            return connection
        except Exception as e:
            st.error(f"An unexpected error occurred: {e}")
            return None
			
#Creating a function to fetch data from databricks SQL warehouse with caching to 1 day
@st.cache_data(ttl=86400)
def query_data(table_name):
    connection = get_connection()
    if connection is None:
        return None

    cursor = connection.cursor()
    catalog_name=os.getenv('catalog_name')
    gold_table=os.getenv('gold_table')

    #Create a query based on passed table_name argument
    query = f"SELECT * FROM {catalog_name}.{gold_table}.{table_name}"

    #Execute the query and convert it into pandas dataframe
    cursor.execute(query)
    df = cursor.fetchall_arrow().to_pandas()

    return df

def fetch_data(key, query_name, error_message):  
    try:  
        if key not in st.session_state:  
            st.session_state[key] = query_data(query_name)  
    except Exception as e:  
        st.error(f"{error_message}: {e}")  
        st.session_state[key] = None  

fetch_data(  
    key='data_table1',  
    query_name='gold_table1'
)  

fetch_data(  
    key='data_table2',  
    query_name='gold_table2' 
)  

fetch_data(
    key='data_table3',  
    query_name='gold_table3' 
)

fetch_data(
    key='data_table4',
    query_name='gold_table4'
)

fetch_data(
    key='data_table5',
    query_name='gold_table5'
)

########################################################

Please help me, how can I optimize this?

Also to resolve the original issue, I implemented something like this:

Let me know if you have some comments/suggestions.

blackary · May 1, 2025, 2:51pm

Sorry, it’s difficult to optimize without knowing more about how you’re using the data. My only general advice is – unless you’re literally displaying the entire table of data to the user (and I mean the entire table, every column, every row), there’s likely opportunity to optimize by limiting the amount of data returned (group bys, selecting only certain columns, etc.).

You certainly know the data and your requirements better than I do, and I don’t have any streamlit-specific advice beyond that blog post I shared above.

system · October 28, 2025, 2:52pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Unable to get caching working for my function that reads from database Using Streamlit cache , pandas , database	5	858	March 8, 2024
Clear cache with a time interval (at least once per hour)? Using Streamlit cache	8	3821	May 13, 2022
Update data from dataframe every day Using Streamlit cache , pandas	2	1706	November 19, 2021
Enter, edit and delete data without fetching the entire df again Using Streamlit cache , session-state , debugging	4	208	February 16, 2025
How to refresh my connection with sqlserver Using Streamlit cache	2	1664	August 18, 2021

Caching update/reset at a particular time interval

Related topics