AttributeError when @st.cache is added to a function - How to fix the issue?

Charly_Wargnier · December 6, 2020, 6:16pm

Hi guys,

I would like to cache a classification function I created, hower each time I add the @st.cache decorator it throws the following error:

AttributeError: 'ratioClass' is not a valid function for 'Series' object

FYI, here’s the function (which works perfectly well when st.cache is not added):

@st.cache(allow_output_mutation=True)
def TokenClassScores():
    dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'],
                                'Duplicate',
                                'Low score')

    return dfToken

I then use Apply to iterate over the dataframe:

dfToken.apply(TokenClassScores(), axis = 1)
dfToken

I tried a few variations yet cannot seem to be able to solve this issue.

Any help super appreciated.

Thanks,
Charly

Charly_Wargnier · December 6, 2020, 10:14pm

Heads-up! Tried with lambda functions and faced the same issue:

@st.cache(suppress_st_warning=True, show_spinner=False,allow_output_mutation=True)
def TokenSortClassScores():

      dfToken['ratioClass']=dfToken['ratio'].apply(lambda x: 'HighScore' if x>=80 else 'MediumScore')

      return dfToken

dfToken.apply(TokenSortClassScores(), axis = 1)

Not quite sure how to overcome this… :

okld · December 7, 2020, 2:28am

Hey @Charly_Wargnier,

The issue doesn’t seem to be related to @st.cache.

DataFrame apply() is not called as intended.

dfToken.apply() takes a function as first argument (a function which takes a column or a row as parameter), but here you’re performing a function call with TokenClassScores() which returns a modified dfToken.

So what you’re doing is roughly equivalent to something like this:

dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'], 'Duplicate', 'Low score')
dfToken.apply(dfToken, axis=1)  # dfToken is a DataFrame, it should've been a function/lambda instead.

The issue is not @st.cache.

I was able to reproduce your issue without @st.cache. Note that in my following code samples, I’ll use dfToken.apply(...) as you did, even though it shouldn’t be used like this.

Display code

import numpy as np
import pandas as pd
import streamlit as st

dfToken = pd.DataFrame({
    "ratioClass": [None, None, None, None, None],
    "Box #01": ["A", "A", "B", "A", "B"],
    "Box #02": ["A", "B", "A", "A", "B"]
})

"Initial DataFrame"
dfToken

"DataFrame apply"
dfToken.apply(dfToken, axis=1)
dfToken

The weird thing is, if I add the dfToken['ratioClass'] line right before dfToken.apply(...), this doesn’t seem to trigger any error. I’d need to dig into pandas’s internal functioning to explain that specific behavior. Here’s the result:

Display code

"DataFrame apply"
dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'], 'Duplicate', 'Low score')
dfToken.apply(dfToken, axis=1)
dfToken

However, if I remove that dfToken.apply(...), it still works as expected, without weird behavior occuring:

Display code

"DataFrame apply"
dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'], 'Duplicate', 'Low score')
dfToken

What should you do?

It depends on the workflow of your application.

As shown above, you don’t need dfToken.apply(...) to change a whole column. Only one dfToken['ratioClass'] = np.where(...) does the trick. But I don’t have the whole picture of that TokenClassScores(), so I can’t really tell what you could cache.

I hope it helped!

Charly_Wargnier · December 7, 2020, 11:23am

Thanks for the prompt feedback, Synode - that’s very useful. Please let me explain what I’m after - hopefully we can optimise the app accordingly!

I’ve uploaded my app here so you can try it for yourself.

The app.py file (code’s not optimized yet! ) can be found here:

So this app allows for fuzzy string matching across 2 sets of keywords/URLs. Paste some keywords/URls in text area #01 and #02 and compare them with various fuzzy algorithms, all taken from the FuzzyWuzzy library.

The app works well for a few URLs, however it takes far too long to get the table updated when e.g. moving a slider and 1000’s of URLs:

You can try it for yourself with a larger dataset:

URLs to be uploaded in box 1

URLs to be uploaded in box 2

I’ve not done any benchmarking yet but I’ve got a sneaking suspicion that the score categorisation in various buckets (high score, medium score etc.) must be what’s hindering the app performance., e.g on line 207:

github.com

CharlyWargnier/fuzzy-matching-app/blob/a3b27afa59f8099f6261a3cc12f7ec997080314a/app.py#L207



#3rd ratio
dfRatio['Token Set Ratio'] = dfRatio.apply(token_set_ratio, axis = 1)
dfRatio['Token Set Ratio'] = dfRatio['Token Set Ratio'].astype(np.float64)


PartialRatio = dfRatio.copy()
PartialRatio = PartialRatio.drop(['Token Set Ratio','token_sort_ratio'], axis=1)
PartialRatio.rename(columns={'Partial Ratio':'ratio'}, inplace=True)

PartialRatio['ratioClass'] = np.where(PartialRatio['Box #01'] == PartialRatio['Box #02'],
                                'Duplicate',
                                np.where(PartialRatio['Box #02'] == '/'
                                , 'Redirects to Home P.',
                                np.where(PartialRatio.ratio >= 80,
                                'HighScore',
                                np.where(PartialRatio.ratio >= 40,
                                'MediumScore',
                                'Low score'))))

PartialRatio['algoType'] = 'Partial Ratio'

In the current code, I cannot add the cache decorator as these np.where lines are not wrapped into a function. So I tried to do that, then iterated through the data frames via df.apply - this obviously didn’t work!

Hopefully that makes sense and there’s a way to cached these np.where/categorisation formulas to improve the app’s speed.

Thanks in advance.

Charly

okld · December 8, 2020, 2:48am

I’ve made a pull request with the changes here: Optimize dataframe and operation caching by okld · Pull Request #1 · CharlyWargnier/fuzzy-matching-app (github.com)

My strategy was to make a function which computes and caches every dataframe operation based on your text area values. This way, modifying your sliders is just a matter of filtering an already existing dataframe, and displaying it in the end.

You can test the new version here: Fuzzy matching app (I’ll remove it once your version will have the changes applied/will be officialy out)

Venky · December 8, 2020, 3:08am

Hello guys,
I’m new to Streamlit . I’m developing a small data app and I cached back end REST-API responses by using @st.cache. The problem here is I’m using caching.clear_cache() to clear the cache when a page is refreshed but it is also clearing the @st.cache responses when i’m doing something on the page.
With out caching.clear_cache() method , it is absolutely working fine.
Is there a way to resolve this issue ?
Please let me know if you need more info.

okld · December 8, 2020, 3:17am

Hello @Venky, welcome to the forum!

As you’re question isn’t strictly related to Charly’s issue, to keep things organized and bring more visibility to your post, could you create a new topic with your question in the category “Using Streamlit”?

Thanks in advance!

Venky · December 8, 2020, 3:23am

Sorry @okld, I just created a new topic. Thank you.

Topic		Replies	Views
@st.cache_data is generating syntax error for below code Using Streamlit cache , session-state , discussion	8	246	November 26, 2024
@st.cache problems Using Streamlit cache	2	781	November 19, 2021
ValueError raised on cached function that does not occur on uncached Using Streamlit	1	286	August 15, 2022
How to cache multiple datasets? Using Streamlit cache , pandas	5	6582	November 19, 2021
@st.cache_data Behavior issue Using Streamlit cache , streamlit-cloud , discussion	13	2213	September 9, 2024

AttributeError when @st.cache is added to a function - How to fix the issue?

Related topics