Experimental_memoization logging when updated

Summary

Hello. I am currently using a function with a datetime function so that the memoization
decorator updates every time the datetime string is changed as time passes. The functions is as follows:

import datetime                                                                                                                                                                                                       
from datetime import datetime as dt                                                                                                                                                                                   
from typing import Sequence                                                                                                                                                                                           
                                                                                                                                                                                                                      
import pandas as pd                                                                                                                                                                                                   
import pytz                                                                                                                                                                                                           
import streamlit as st                                                                                                                                                                                                
from arenkods.data.fs import FeatureStore                                                                                                                                                                             
from arenkods.data.preds import PredictionExchange                                                                                                                                                                    
from pandas.io.formats.style import Styler                                                                                                                                                                            
                                                                                                                                                                                                                      
                                                                                                                                                                                                                      
def get_day_and_hour() -> str:                                                                                                                                                                                        
    return dt.now(tz=pytz.utc).strftime("%Y-%m-%d-%H-%Z")        

@st.experimental_memo                                                                                                                                                                                                 
def extract_data_from_request(url: str, date_refresh: dt, timeout: int = 30) -> pd.DataFrame:                                                                                                                         
    """                                                                                                                                                                                                               
    return the pandas dataframe from requests and refactor the dataset                                                                                                                                                
    default url is set to the historical cleared volume for brevity                                                                                                                                                   
    """                                                                                                                                                                                                               
    # get data from ESO api                                                                                                                                                                                           
    with requests.get(url, timeout=timeout) as response:                                                                                                                                                              
        dictr = response.json()                                                                                                                                                                                       
        # validate that each of the json package we get is consistent with the schema in the json file                                                                                                                
        recs = dictr["result"]["records"]                                                                                                                                                                             
        # save dc results as dataframe after renaming some of the columns if not matching the format we want                                                                                                          
        df = pd.json_normalize(recs)                                                                                                                                                                                  
        # TODO: maybe put this in a generic process func that also does date parsing and indexing                                                                                                                     
        if "Service" in df.columns:                                                                                                                                                                                   
            df["Service"] = df["Service"].replace(                                                                                                                                                                    
                {                                                                                                                                                                                                     
                    "DCH": "DC-H",                                                                                                                                                                                    
                    "DCL": "DC-L",                                                                                                                                                                                    
                    "DMH": "DM-H",                                                                                                                                                                                    
                    "DML": "DM-L",                                                                                                                                                                                    
                    "DRH": "DR-H",                                                                                                                                                                                    
                    "DRL": "DR-L",                                                                                                                                                                                    
                }                                                                                                                                                                                                     
            )                                                                                                                                                                                                         
        return df                    

The extract_data_from_request function takes the get_day_and_hour() as the date_refresh variable, and as far as I can see, the cached memory stays as long as the request and the API data itself has not been updated. My question is, whenever it is updated and the cache changes, is there a way I can take the date and time when the memoization has been updated? So that I can add in the streamlit dashboard ‘This visualization has last been updated at …’

I hope this question makes sense.

Thanks

Hi @sang_young_noh :wave:

Thanks for all the info in your question! I’m wondering if something as simple as returning dt.now(tz=pytz.utc) along with the df in extract_data_from_request would be sufficient… Something along the lines of:

import datetime
from datetime import datetime as dt
import pandas as pd
import pytz
import streamlit as st

def get_day_and_hour():
    return dt.now(tz=pytz.utc).strftime("%Y-%m-%d-%H-%Z")

@st.experimental_memo
def extract_data_from_request(date_refresh, random_number):
    return (
        pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}),
        date_refresh,
        dt.now(tz=pytz.utc), # Timestamp changes when cache updates
    )

# This is here just to update the cache. Feel free to ignore
random_number = st.slider("Random number", 0, 100, 1)

df, date_refresh, last_updated = extract_data_from_request(
    get_day_and_hour(), random_number
)
st.write(f"This visualization has last been updated at: {last_updated}")

In my example above, any time the random_number changes, the cache is updated/changed. When the cache is updated, so is the value of last_updated:

datetime-cache

Snehan :balloon: