AttributeError when @st.cache is added to a function - How to fix the issue?

Hi guys,

I would like to cache a classification function I created, hower each time I add the @st.cache decorator it throws the following error:

AttributeError: 'ratioClass' is not a valid function for 'Series' object

FYI, hereā€™s the function (which works perfectly well when st.cache is not added):

@st.cache(allow_output_mutation=True)
def TokenClassScores():
    dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'],
                                'Duplicate',
                                'Low score')

    return dfToken

I then use Apply to iterate over the dataframe:

dfToken.apply(TokenClassScores(), axis = 1)
dfToken

I tried a few variations yet cannot seem to be able to solve this issue.

Any help super appreciated. :slight_smile:

Thanks,
Charly

Heads-up! Tried with lambda functions and faced the same issue:

@st.cache(suppress_st_warning=True, show_spinner=False,allow_output_mutation=True)
def TokenSortClassScores():

      dfToken['ratioClass']=dfToken['ratio'].apply(lambda x: 'HighScore' if x>=80 else 'MediumScore')

      return dfToken

dfToken.apply(TokenSortClassScores(), axis = 1)

Not quite sure how to overcome thisā€¦ :thinking: :

Hey @Charly_Wargnier,

The issue doesnā€™t seem to be related to @st.cache.

DataFrame apply() is not called as intended.

dfToken.apply() takes a function as first argument (a function which takes a column or a row as parameter), but here youā€™re performing a function call with TokenClassScores() which returns a modified dfToken.

So what youā€™re doing is roughly equivalent to something like this:

dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'], 'Duplicate', 'Low score')
dfToken.apply(dfToken, axis=1)  # dfToken is a DataFrame, it should've been a function/lambda instead.

The issue is not @st.cache.

I was able to reproduce your issue without @st.cache. Note that in my following code samples, Iā€™ll use dfToken.apply(...) as you did, even though it shouldnā€™t be used like this.

Display code
import numpy as np
import pandas as pd
import streamlit as st

dfToken = pd.DataFrame({
    "ratioClass": [None, None, None, None, None],
    "Box #01": ["A", "A", "B", "A", "B"],
    "Box #02": ["A", "B", "A", "A", "B"]
})

"Initial DataFrame"
dfToken

"DataFrame apply"
dfToken.apply(dfToken, axis=1)
dfToken

The weird thing is, if I add the dfToken['ratioClass'] line right before dfToken.apply(...), this doesnā€™t seem to trigger any error. Iā€™d need to dig into pandasā€™s internal functioning to explain that specific behavior. Hereā€™s the result:

Display code
"DataFrame apply"
dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'], 'Duplicate', 'Low score')
dfToken.apply(dfToken, axis=1)
dfToken

image

However, if I remove that dfToken.apply(...), it still works as expected, without weird behavior occuring:

Display code
"DataFrame apply"
dfToken['ratioClass'] = np.where(dfToken['Box #01'] == dfToken['Box #02'], 'Duplicate', 'Low score')
dfToken

image

What should you do?

It depends on the workflow of your application.

As shown above, you donā€™t need dfToken.apply(...) to change a whole column. Only one dfToken['ratioClass'] = np.where(...) does the trick. But I donā€™t have the whole picture of that TokenClassScores(), so I canā€™t really tell what you could cache.

I hope it helped!

1 Like

Thanks for the prompt feedback, Synode - thatā€™s very useful. Please let me explain what Iā€™m after - hopefully we can optimise the app accordingly! :slight_smile:

Iā€™ve uploaded my app here so you can try it for yourself.

The app.py file (codeā€™s not optimized yet! :stuck_out_tongue: ) can be found here:

So this app allows for fuzzy string matching across 2 sets of keywords/URLs. Paste some keywords/URls in text area #01 and #02 and compare them with various fuzzy algorithms, all taken from the FuzzyWuzzy library.

The app works well for a few URLs, however it takes far too long to get the table updated when e.g. moving a slider and 1000ā€™s of URLs:

You can try it for yourself with a larger dataset:

URLs to be uploaded in box 1

URLs to be uploaded in box 2

Iā€™ve not done any benchmarking yet but Iā€™ve got a sneaking suspicion that the score categorisation in various buckets (high score, medium score etc.) must be whatā€™s hindering the app performance., e.g on line 207:

In the current code, I cannot add the cache decorator as these np.where lines are not wrapped into a function. So I tried to do that, then iterated through the data frames via df.apply - this obviously didnā€™t work! :slight_smile:

Hopefully that makes sense and thereā€™s a way to cached these np.where/categorisation formulas to improve the appā€™s speed.

Thanks in advance. :pray:

Charly

Iā€™ve made a pull request with the changes here: Optimize dataframe and operation caching by okld Ā· Pull Request #1 Ā· CharlyWargnier/fuzzy-matching-app (github.com)

My strategy was to make a function which computes and caches every dataframe operation based on your text area values. This way, modifying your sliders is just a matter of filtering an already existing dataframe, and displaying it in the end.

You can test the new version here: Fuzzy matching app (Iā€™ll remove it once your version will have the changes applied/will be officialy out)

Hello guys,
Iā€™m new to Streamlit . Iā€™m developing a small data app and I cached back end REST-API responses by using @st.cache. The problem here is Iā€™m using caching.clear_cache() to clear the cache when a page is refreshed but it is also clearing the @st.cache responses when iā€™m doing something on the page.
With out caching.clear_cache() method , it is absolutely working fine.
Is there a way to resolve this issue ?
Please let me know if you need more info.

Hello @Venky, welcome to the forum!

As youā€™re question isnā€™t strictly related to Charlyā€™s issue, to keep things organized and bring more visibility to your post, could you create a new topic with your question in the category ā€œUsing Streamlitā€?

Thanks in advance!

1 Like

Sorry @okld, I just created a new topic. Thank you.