Cache miss with no change in input

Hello together!

I would like to use the great :hugs: Huggingface transformers in a streamlit app. Unfortunately, the caching of the loaded models does not work as expected. It gives cache misses even though I previously had loaded the model for that input/key. Sometimes the load function that loads the current model is even re-executed when I only change the text in the text field. Note that I do use the allow_output_mutation=True argument. I added a sample script that works after installing torch and transformers. The models are community models that can be accessed by anyone.

And then there is this other annoying thing. When I rerun the script after some code change, I get this error:

ValueError: Custom>TFBertMainLayer has already been registered to <class 'transformers.modeling_tf_bert.TFBertMainLayer'>

Thanks for the help!
Lutz

import streamlit as st
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

@st.cache(allow_output_mutation=True)
def load_bert(classif):
    if classif=="hatespeech":
        name = "deepset/bert-base-german-cased-hatespeech-GermEval18Coarse"
    elif classif=="sentiment":
        name = "nlptown/bert-base-multilingual-uncased-sentiment"
    else:
        return None, None
    t = AutoTokenizer.from_pretrained(name)
    m = AutoModelForSequenceClassification.from_pretrained(name)
    m.eval()
    return m, t

classifier = st.sidebar.selectbox('Classifier', ["hatespeech", "sentiment"])

model, tokenizer = load_bert(classifier)

text = st.text_area("Please enter text to classify", "Ich bin ein Berliner.")

if text:
    input_ids = torch.tensor(tokenizer.encode(text, add_special_tokens=True)).unsqueeze(0)  # Batch size 1
    output = model(input_ids)
    st.text(output[0])

Hey @Lutz!

A given function’s cache can be invalidated if the contents of that function change - that is, the key that Streamlit generates for each entry in function’s cache is something like

hash(load_bert.__code__) + hash(classif)

(This is not literally how the key is generated - we’re not concatenating hash strings together - but it’s conceptually along those lines.)

This means that a function’s cache will become invalidated if you’re editing the function itself during testing. (Whitespace changes inside the function shouldn’t contribute to the hash value here, but anything that changes the function’s compiled bytecode will.) Is it possible that’s what’s happening here?

Hey @tim!

Thanks for the fast reply and the explanation! I did not change the load function during the run of streamlit. But I think I found the reason, and it is related.

Above error is actually not reproducable on a vanilla install, I found out :sweat_smile: . The reason for the reloads was that I have a local version of the transformers library which I slightly altered. I installed it via pip install -e ./transformers, where the command line option -e means editable mode:
-e,--editable <path/url> Install a project in editable mode (i.e. setuptools "develop mode") from a local project path or a VCS url.
I did not change the transformers code during the streamlit run, but that seems to be the reason. If I install the local library without editable mode, the caching works fine. :+1:

TLDR: For proper caching in streamlit, do not install libraries in editable mode.

Thanks,
Lutz

Ah, interesting! I’m not positive this is what is happening, but the issue you’re seeing may be because your editable copy of the transformers library is in your streamlit app’s source path. When streamlit re-loads your app after it’s been edited, it reloads all modules contained within the root directory in your app’s path.

If I’m correct and this is the cause of the unexpected behavior you’re seeing, you might still be able to keep your transformers lib installed in editable mode - you just need to change your app structure a little bit. For example:

my_app/
- transformers/
- src/
-- my_app.py

And then start your app via streamlit run src/my_app.py (or whatever). In this configuration, streamlit won’t reload transformers when my_app.py changes.

1 Like

Hi Tim,
you are exactly right. In this situation, the library is a subdirectory.
Thanks for pointing out the alternative to allow for the editable mode! :+1:
Cheers
Lutz

1 Like