How to serve robots.txt

Joshua5 · October 16, 2024, 1:36am

I recently deployed a dockerized streamlit app on GCP’s Cloud Run service and I’ve been getting bombarded with requests from OpenAI’s GPTBot at the rate of about 1req/s for the last several hours. I’d like to prevent them from continuing to hit my app so naturally I’d like to serve a robots.txt at the root level, but I don’t know how do this with streamlit and I’m not finding anything in the documentation about robots.txt.

I looked into the static file serving part of the documentation, but it only serves files at /app/static/robots.txt and not at the root level.

Any help here would be greatly appreciated

Joshua5 · October 16, 2024, 4:35am

…just noticed that not even streamlit cloud apps seem to serve a robots.txt.

For reference, you can try https://yourmist.streamlit.app/robots.txt and you’ll get a Page not found window, but no robots.txt.

I think this further suggests that the robots.txt feature is also in streamlit cloud’s best interests and not just for us interested in self hosting. I imagine the streamlit cloud apps are also potentially getting hit hard by GPTBot.

edsaac · October 16, 2024, 12:48pm

I don’t think there is a native support for robots.txt. The closest thing I can think of is manually checking for the user agent with st.context and stopping the app if GPTBot is the one making the request. Something like:

import streamlit as st

user_agent = st.context.headers.get("User-Agent", False)

if not user_agent or "Chrome" in user_agent:
    st.error(f"This app does not work with \n\n>{user_agent}.")
    st.stop()

st.title("Welcome!")
st.write("To my expensive app running here...")

Joshua5 · October 16, 2024, 5:52pm

I think adding support for robots.txt would be a great addition to streamlit.

I opened an issue on github here as a feature request: Allow streamlit apps to easily serve robots.txt at the root level · Issue #9673 · streamlit/streamlit · GitHub

Please give that issue a thumbs up if you agree with it!

AjayKidave · October 17, 2024, 5:50pm

I had the same requirement, to serve robots.txt. The Clace app deployment platform I have been building has support for this. Create a static_root folder in the the root directory of your Streamlit app code. All files present under that folder, like robots.txt and favicon.ico are served at the base of your app install path.

Joshua5 · October 17, 2024, 7:08pm

Thanks for sharing this. Unfortunately I think I’ll be switching over to streamlit cloud to get these bots off my back (I got charged $3 yesterday!). If I ever want to deploy a streamlit app internally for work in the future though, I’ll make sure to check out your project!

Joshua5 · October 28, 2024, 7:24pm

Update: I ended up moving my app off of GCP and onto streamlit cloud to prevent incurring further costs.

If you want to read more about it, you can read the github issue: Allow streamlit apps to easily serve robots.txt at the root level · Issue #9673 · streamlit/streamlit · GitHub

… but tl;dr, hosting my little streamlit app cost me over $25 for just two weeks!

Also, please thumbs up the original message in this github issue if you agree that streamlit apps should be able to protect themselves with robots.txt in a future version. I love building streamlit apps, but the fact that I can’t self host them and protect myself from scrapers makes it really hard to recommend to others.

system · April 26, 2025, 7:25pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Host streamlit app at a subfolder Deployment docker	12	11899	November 19, 2021
Serving Streamlit on AWS Deployment aws	2	3964	November 19, 2021
My streamlit is running properly on local but not on streamlit cloud Community Cloud streamlit-cloud	4	449	November 4, 2023
Streamlit prototype/staging environment on gcp Deployment gcp , docker	7	1831	November 24, 2023
SSH Deployment Using Streamlit	2	2823	December 19, 2023

How to serve robots.txt

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies