I want to test out replacing a BI tool with streamlit. I am currently querying a postgres database, capturing the data in a pandas dataframe, transforming the data, and visualising time series and histograms with Altair. There are multiple opportunities for caching. I haven’t yet found an example covering this use case, can anyone recommend one? Thanks in advance
This is a very similar use case to a dashboard application we have built for internal use. We use BigQuery as the backend database. The main, parameterized, query supports several different report pages and it is the dataframe returned for a particular parameter set that gets cached.
There are other opportunities for caching within the report analyses but we generally don’t use those - I tried it once and adding cached subsets of cached data along with the other state management is a great way to tie yourself into logical knots. Easier and usually almost as fast to have just a few pools of data. Also BigQuery itself caches query results for 24 hours so it is often just not worth getting too sophisticated.
The rule of thumb I use goes roughly like this:
- Information that is only used to rerun a report: use out of the box streamlit
- Information (for example filter settings or drill down) that needs to be transferred between reports: use SessionState or one of its variants
- Data that is accessed across collection of pages: Use caching, sometimes mutable, sometimes not - it depends on the specific use case
Recently we started incorporating another dataset from an on-prem SQL Server instance. That is only updated daily and it turns out to be easier to manage a persistent file cache of the dataframe explicitly (we use pandas.to_pickle() and gz). That is fast to load with minimal memory overhead.