Using Streamlit cache with Polars

Fabio · February 20, 2023, 12:49pm

Polars is a new Python library that often executes much faster (10x) than Pandas. You can convert Pandas dataframes to Polars dataframe and viceversa with the Polars to_polars and to_pandas function.

I am converting my Pandas functions to Polars, function by function.

I have noticed that the Streamlit cache does not seem to support Polars dataframe. It gives me an error if I try to input a Polars dataframe in a “cached” function.

My solution is to convert back every Polars dataframe to Pandas so each function returns a Pandas dataframe.

I was wondering if there were plans to support Polars dataframe in terms of st.cache.

Thanks

Fabio

Caroline · February 21, 2023, 6:29pm

Hey @Fabio,

Check out this related GitHub Issue and please upvote the Issue if you’d like our team to prioritize it. Thanks!

snehankekre · February 22, 2023, 11:07am

Hi @Fabio

@st.cache was deprecated in Streamlit 1.18.0. So st.cache will never support caching Polars dataframes. We recommend using one of the new caching decorators @st.cache_data as a replacement to cache data.

Here’s an example demonstrating caching of a Polars dataframe:

import polars as pl
import streamlit as st

@st.cache_data
def load_data():
    return pl.DataFrame(
        {
            "A": [1, 2, 3, 4, 5],
            "B": [5, 4, 3, 2, 1],
            "fruits": ["apple", "banana", "pear", "apple", "banana"],
        }
    )


df = load_data()

st.write(df)

Fabio · February 22, 2023, 12:01pm

Hi,

I am actually using the new cache function and that (cached read csv) worked for me to.

What does not work is when I feed my polars DF into a cached function.

I can try to repeat the error if you wish.

Thanks!

Fabio

snehankekre · February 22, 2023, 12:06pm

Yes, please share a minimal reproducible example

snehankekre · February 22, 2023, 12:13pm

@Fabio I’m guessing you’re running into the UnhashableParamError when passing a Polars dataframe as an argument to a cache-decorated function. To tell Streamlit to stop hashing the argument, add a leading underscore to the argument’s name in the function signature:

import polars as pl

import streamlit as st


@st.cache_data
def load_data():
    return pl.DataFrame(
        {
            "A": [1, 2, 3, 4, 5],
            "B": [5, 4, 3, 2, 1],
            "fruits": ["apple", "banana", "pear", "apple", "banana"],
        }
    )


df = load_data()

st.write(df)


@st.cache_data
def show_columns(_polars_df):
    return _polars_df.columns

columns = show_columns(df)
st.write(columns)

Although the excluded parameter won’t be hashed, Streamlit still caches the output.

snehankekre · February 22, 2023, 12:52pm

On taking another look, I realize you’re correct in thinking that the function will not rerun if the excluded parameter (when it is the only param to the function) doesn’t change.

What you can do in this case is pass another input param to the cached function that changes whenever the polars dataframe changes. One such option is to use polars.DataFrame.hash_rows in conjunction with polars.Series.view. The first method hashes and combines the rows in the polars DataFrame. As the result is an unhashable polars.Series object, we convert it to a NumPy array containing the UInt64 hashes:

import polars as pl
import streamlit as st

@st.cache_data
def load_data():
    print("loading data")
    return pl.DataFrame(
        {
            "A": [1, 2, 3, 4, 5],
            "B": [5, 4, 3, 2, 1],
            "fruits": ["apple", "banana", "pear", "apple", "banana"],
        }
    )


st.button("Rerun")

df = load_data()

st.write(df)

@st.cache_data
def show_columns(_polars_df, hash):
    print("showing columns")
    return _polars_df.columns


if st.checkbox("Edit data"):
    df = df.drop("fruits")
    st.write(df)

columns = show_columns(df, df.hash_rows(seed=42).view())
st.write(columns)

This method ensures that whenever the underlying unhashable Polars dataframe changes, the function is re-run because the array of hashes changes.

Fabio · February 22, 2023, 2:01pm

Yes this works!! And probably takes less computer resources than converting back and forth. Thanks a million!

Fabio · February 23, 2023, 6:02am

@snehankekre,

apparently with the upcoming Pandas 2.0 converting from Pandas to Polars and vice-versa will become a “free” operation (both have underlying arrow structure), which I think solves the issue.

matth · April 17, 2023, 9:34pm

@st.resource appears to treat polar data frames better, I think the serialisation/pickle aspect of @st.cache causes inflation of the polars df as well as inconsistencies when a hash is calculated?

OSuwaidi · August 30, 2023, 10:46pm

I am new to Streamlit, and I agree with @matth, I was using the @sst.cache_data decorator before the function that loads my Polars dataframe via .read_csv() (~3.5 GB), and it was slowing down the loading time and visualizations considerably as compared to just directly reading the data (i.e., without using the decorator or a function).

Topic		Replies	Views
Using caching with API calls and messy DataFrames 🎈 Using Streamlit cache , pandas	5	1055	November 19, 2021
Cache a dataframe ; @st.cache 🎈 Using Streamlit cache , pandas	1	1203	February 7, 2023
Caching pandas dataframe 🎈 Using Streamlit cache , pandas	4	10792	November 19, 2021
St.cache and ouput mutation 🎈 Using Streamlit	3	788	May 13, 2022
Concurrency in a expensive cached dataclass 🎈 Using Streamlit cache , pandas	4	234	January 23, 2024

Using Streamlit cache with Polars

Related Topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies