Cache miss with no change in input

Lutz · March 18, 2020, 8:37am

Hello together!

I would like to use the great Huggingface transformers in a streamlit app. Unfortunately, the caching of the loaded models does not work as expected. It gives cache misses even though I previously had loaded the model for that input/key. Sometimes the load function that loads the current model is even re-executed when I only change the text in the text field. Note that I do use the allow_output_mutation=True argument. I added a sample script that works after installing torch and transformers. The models are community models that can be accessed by anyone.

And then there is this other annoying thing. When I rerun the script after some code change, I get this error:

ValueError: Custom>TFBertMainLayer has already been registered to <class 'transformers.modeling_tf_bert.TFBertMainLayer'>

Thanks for the help!
Lutz

import streamlit as st
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

@st.cache(allow_output_mutation=True)
def load_bert(classif):
    if classif=="hatespeech":
        name = "deepset/bert-base-german-cased-hatespeech-GermEval18Coarse"
    elif classif=="sentiment":
        name = "nlptown/bert-base-multilingual-uncased-sentiment"
    else:
        return None, None
    t = AutoTokenizer.from_pretrained(name)
    m = AutoModelForSequenceClassification.from_pretrained(name)
    m.eval()
    return m, t

classifier = st.sidebar.selectbox('Classifier', ["hatespeech", "sentiment"])

model, tokenizer = load_bert(classifier)

text = st.text_area("Please enter text to classify", "Ich bin ein Berliner.")

if text:
    input_ids = torch.tensor(tokenizer.encode(text, add_special_tokens=True)).unsqueeze(0)  # Batch size 1
    output = model(input_ids)
    st.text(output[0])

tim · March 18, 2020, 7:16pm

Hey @Lutz!

A given function’s cache can be invalidated if the contents of that function change - that is, the key that Streamlit generates for each entry in function’s cache is something like

hash(load_bert.__code__) + hash(classif)

(This is not literally how the key is generated - we’re not concatenating hash strings together - but it’s conceptually along those lines.)

This means that a function’s cache will become invalidated if you’re editing the function itself during testing. (Whitespace changes inside the function shouldn’t contribute to the hash value here, but anything that changes the function’s compiled bytecode will.) Is it possible that’s what’s happening here?

Lutz · March 19, 2020, 11:20am

Hey @tim!

Thanks for the fast reply and the explanation! I did not change the load function during the run of streamlit. But I think I found the reason, and it is related.

Above error is actually not reproducable on a vanilla install, I found out . The reason for the reloads was that I have a local version of the transformers library which I slightly altered. I installed it via pip install -e ./transformers, where the command line option -e means editable mode:
-e,--editable <path/url> Install a project in editable mode (i.e. setuptools "develop mode") from a local project path or a VCS url.
I did not change the transformers code during the streamlit run, but that seems to be the reason. If I install the local library without editable mode, the caching works fine.

TLDR: For proper caching in streamlit, do not install libraries in editable mode.

Thanks,
Lutz

tim · March 23, 2020, 4:13pm

Ah, interesting! I’m not positive this is what is happening, but the issue you’re seeing may be because your editable copy of the transformers library is in your streamlit app’s source path. When streamlit re-loads your app after it’s been edited, it reloads all modules contained within the root directory in your app’s path.

If I’m correct and this is the cause of the unexpected behavior you’re seeing, you might still be able to keep your transformers lib installed in editable mode - you just need to change your app structure a little bit. For example:

my_app/
- transformers/
- src/
-- my_app.py

And then start your app via streamlit run src/my_app.py (or whatever). In this configuration, streamlit won’t reload transformers when my_app.py changes.

Lutz · March 23, 2020, 4:26pm

Hi Tim,
you are exactly right. In this situation, the library is a subdirectory.
Thanks for pointing out the alternative to allow for the editable mode!
Cheers
Lutz

Topic		Replies	Views
Cache Keras trained model Using Streamlit cache , keras	8	7407	November 19, 2021
PyTorch App is re-running every time despite having cached the function Using Streamlit	1	518	January 12, 2022
Text_input refreshes whole script Using Streamlit text-input	2	1472	November 24, 2022
Caching not working for a model training attribute Using Streamlit cache	3	505	May 13, 2022
Error when I try and cache Keras Using Streamlit cache , keras	2	1896	November 19, 2021

Cache miss with no change in input

Related Topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies