Dataframe reloads despite using cache when changing value in select box

Summary

I made a very simple web scrapping function and I am trying to make a user interface using streamlit. The user uploads a file with the name of the items to be search on the web. There is a bar on the side panel showing a list of all the items found. Whenever the side bar option is changed the program starts searching again. This only happens once, meaning that the web is scrapped twice for every file that is uploaded. The web scrapping process takes several minutes so having it happen twice makes a huge difference. Cache seems to only work after it has loaded for a second time.

Steps to reproduce

Code snippet:

@st.cache
def find_products(file):
    productosToSearch = # Get names from file
    return productToSearch

@st.cache(allow_output_mutation=True)
def search_prices(productsToSearch):
    df_products = # Returns pd.dataFrame with products and prices
    return df_products

# Main page
st.title('Buscador de precios')
file =st.file_uploader('Porfavor escoja el archivo con los productos que desea encontrar', type=['xlsx'])
my_progress_bar=st.empty()
productsToSearch=find_products(file)

# Side bar
# Filter prices shown based on selection
options = ['Todos']
options.extend(productsToSearch)
producto_to_show =st.sidebar.selectbox('Productos mostrados',options)
df_productos=search_prices(productsToSearch)




Expected behavior:

I would like for the data only to be searched once for every file uploaded by the user.

Debug info

  • Streamlit version: 1.16.0
  • Python version: 3.8.15
  • Using Conda
  • OS version: 12.3.1
  • Browser version: Chrome Version 108.0.5359.124

Have you tried with st.experimental_memo instead of st.cache? I recall from another thread that someone had issues with st.cache in a web-scraping application, but that the newer, soon-to-be-promoted st.experimental_memo worked.

1 Like