I recently deployed a dockerized streamlit app on GCP’s Cloud Run service and I’ve been getting bombarded with requests from OpenAI’s GPTBot at the rate of about 1req/s for the last several hours. I’d like to prevent them from continuing to hit my app so naturally I’d like to serve a robots.txt at the root level, but I don’t know how do this with streamlit and I’m not finding anything in the documentation about robots.txt.
I looked into the static file serving part of the documentation, but it only serves files at /app/static/robots.txt and not at the root level.
I think this further suggests that the robots.txt feature is also in streamlit cloud’s best interests and not just for us interested in self hosting. I imagine the streamlit cloud apps are also potentially getting hit hard by GPTBot.
I don’t think there is a native support for robots.txt. The closest thing I can think of is manually checking for the user agent with st.context and stopping the app if GPTBot is the one making the request. Something like:
import streamlit as st
user_agent = st.context.headers.get("User-Agent", False)
if not user_agent or "Chrome" in user_agent:
st.error(f"This app does not work with \n\n>{user_agent}.")
st.stop()
st.title("Welcome!")
st.write("To my expensive app running here...")
I had the same requirement, to serve robots.txt. The Clace app deployment platform I have been building has support for this. Create a static_root folder in the the root directory of your Streamlit app code. All files present under that folder, like robots.txt and favicon.ico are served at the base of your app install path.
Thanks for sharing this. Unfortunately I think I’ll be switching over to streamlit cloud to get these bots off my back (I got charged $3 yesterday!). If I ever want to deploy a streamlit app internally for work in the future though, I’ll make sure to check out your project!
… but tl;dr, hosting my little streamlit app cost me over $25 for just two weeks!
Also, please thumbs up the original message in this github issue if you agree that streamlit apps should be able to protect themselves with robots.txt in a future version. I love building streamlit apps, but the fact that I can’t self host them and protect myself from scrapers makes it really hard to recommend to others.
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.