I would like to use the great Huggingface transformers in a streamlit app. Unfortunately, the caching of the loaded models does not work as expected. It gives cache misses even though I previously had loaded the model for that input/key. Sometimes the load function that loads the current model is even re-executed when I only change the text in the text field. Note that I do use the allow_output_mutation=True argument. I added a sample script that works after installing torch and transformers. The models are community models that can be accessed by anyone.
And then there is this other annoying thing. When I rerun the script after some code change, I get this error:
ValueError: Custom>TFBertMainLayer has already been registered to <class 'transformers.modeling_tf_bert.TFBertMainLayer'>
Thanks for the help!
Lutz
import streamlit as st
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
@st.cache(allow_output_mutation=True)
def load_bert(classif):
if classif=="hatespeech":
name = "deepset/bert-base-german-cased-hatespeech-GermEval18Coarse"
elif classif=="sentiment":
name = "nlptown/bert-base-multilingual-uncased-sentiment"
else:
return None, None
t = AutoTokenizer.from_pretrained(name)
m = AutoModelForSequenceClassification.from_pretrained(name)
m.eval()
return m, t
classifier = st.sidebar.selectbox('Classifier', ["hatespeech", "sentiment"])
model, tokenizer = load_bert(classifier)
text = st.text_area("Please enter text to classify", "Ich bin ein Berliner.")
if text:
input_ids = torch.tensor(tokenizer.encode(text, add_special_tokens=True)).unsqueeze(0) # Batch size 1
output = model(input_ids)
st.text(output[0])
A given function’s cache can be invalidated if the contents of that function change - that is, the key that Streamlit generates for each entry in function’s cache is something like
hash(load_bert.__code__) + hash(classif)
(This is not literally how the key is generated - we’re not concatenating hash strings together - but it’s conceptually along those lines.)
This means that a function’s cache will become invalidated if you’re editing the function itself during testing. (Whitespace changes inside the function shouldn’t contribute to the hash value here, but anything that changes the function’s compiled bytecode will.) Is it possible that’s what’s happening here?
Thanks for the fast reply and the explanation! I did not change the load function during the run of streamlit. But I think I found the reason, and it is related.
Above error is actually not reproducable on a vanilla install, I found out . The reason for the reloads was that I have a local version of the transformers library which I slightly altered. I installed it via pip install -e ./transformers, where the command line option -e means editable mode: -e,--editable <path/url> Install a project in editable mode (i.e. setuptools "develop mode") from a local project path or a VCS url.
I did not change the transformers code during the streamlit run, but that seems to be the reason. If I install the local library without editable mode, the caching works fine.
TLDR: For proper caching in streamlit, do not install libraries in editable mode.
Ah, interesting! I’m not positive this is what is happening, but the issue you’re seeing may be because your editable copy of the transformers library is in your streamlit app’s source path. When streamlit re-loads your app after it’s been edited, it reloads all modules contained within the root directory in your app’s path.
If I’m correct and this is the cause of the unexpected behavior you’re seeing, you might still be able to keep your transformers lib installed in editable mode - you just need to change your app structure a little bit. For example:
my_app/
- transformers/
- src/
-- my_app.py
And then start your app via streamlit run src/my_app.py (or whatever). In this configuration, streamlit won’t reload transformers when my_app.py changes.
Hi Tim,
you are exactly right. In this situation, the library is a subdirectory.
Thanks for pointing out the alternative to allow for the editable mode!
Cheers
Lutz
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.