Preventing malicious bots crushing Streamlit-app

Hello folks,

i am really new to streamlit-dev. I have rolled out an app on a EC2 instance using docker. My structure is based on multi-container approach: Webserver → Multipage Streamlit-app → Restful API (with in-memory Cache). 1 day ago my app went down. After inspecting logs in all containers i have found out that some malicious requests were the last events before the container has stopped.
2023-07-15 20:06:23.349 “browser.browser” is not a valid config option. If you previously had this config option set, it may have been removed.

2023-07-16 04:00:01.653 MediaFileHandler: Missing file .env
2023-07-19 00:03:02.637 MediaFileHandler: Missing file wp-includes/wlwmanifest.xml
Stopping


It seems that the last event was a request for wordpress files
 obviously a malicious request
 but i didn’t get why the app crashed. while requesting the same with a browser: domain-name/wp-includes/wlwmanifest.xml or any other non-existing ressource it returns nothing
 nothing in logs too.

Has someone an idea, is there a bug in streamlit or it wasn’t just properly configured or probably the root of the problem is not in streamlit?

Thanks for any ideas in advance!

Hi there @aab ,

Can you share your Dockerfile and config.toml file? First let’s check what is causing this:

2023-07-15 20:06:23.349 “browser.browser” is not a valid config option. If you previously had this config option set, it may have been removed.

On another hand, did you check resource usage? Maybe there is a memory leak in your app, and each bot “visit” was building it up until it crashed.

Hi @marduk

thanks for your response. Sure:

Dockerfile:

FROM python:3.11-slim
WORKDIR /app
COPY ./requirements.txt requirements.txt
RUN pip install --upgrade pip
RUN pip3 install --no-cache-dir --upgrade -r requirements.txt
COPY ./src/ ./
ENTRYPOINT ["streamlit", "run", "main.py", "--server.port=8501", "--server.address=0.0.0.0"]

requirements.txt

streamlit==1.24.0
requests==2.31.0
python-dotenv==1.0.0

docker-compose.yml:

  app:
    build: ./app
    container_name: stream-app
    healthcheck:
      test: ["CMD", "curl", "--fail", "http://app:8501/_stcore/health"]
      interval: 1m30s
      timeout: 10s
      retries: 3
    depends_on:
      - fastapi
      - redis
    environment:
      PRODUCTION: 'true'
    env_file:
      - .env
    expose:
      - 8051
    networks:
      - prod_network

config.toml:

[browser]
browser.gatherUsageStats = false

I haven’t check the ressource usage yet. I have come across mentioning it in the internet, but was not sure how to check it 


Two quick things:

  1. Could you repost your Docket, docker-compose, etc. files as code blocks? You can do this by adding triple backticks ``` around each snippet. This will make them formatted correctly so I and others can test them out.
  2. In my experience, it’s not uncommon for bots to try and access wordpress endpoints on random domains, just because wordpress is a popular platform, and often is deployed in ways that leave some vulnerabilities. This shouldn’t affect your app running in any meaningful way.

I would focus on two main things for debugging this:

  1. Try to run your app independently of docker, and see if you experience any crashes
  2. Try to run a super simple app using your docker setup, and see if you experience any crashes

That should help narrow down whether it’s a docker issue or an issue with your app itself.

@blackary thanks for the hint with ```! :slight_smile:
And thank you for your reply!

The app was up for a week or two, it’s not crushing after an hour or two. My basic security concept is based on isolating the app and not running on the host machine, the port 8051 is exposed only inside a docker network and being accessed by a web server. host machine is only exposed to ports 80, 443, 22. Being exposed to the internet I have implemented basic login function:

def login():
    # Get the login password from environment variables
    actual_password = os.environ['LOGIN_PASSWORD']
    # Initialize session state for login status
    st.session_state.login_successful = st.session_state.get('login_successful', False)

    # If the user is not logged in yet
    if not st.session_state.login_successful:
        placeholder = st.empty()

        # Show login form
        with placeholder.form("login"):
            st.markdown("#### Enter your password")
            password = st.text_input("Password", type="password")
            submit = st.form_submit_button("Login")

        # After user submits the form, check if the password is correct
        if submit:
            if password == actual_password:
                # Correct password, clear the form and mark login as successful
                placeholder.empty()
                st.success("Login successful")
                st.session_state.login_successful = True
            else:
                # Incorrect password, show an error
                st.error("Login failed")

My next steps:

  1. I am going to monitor resources usage espl. the memory leak topic, as suggested by @marduk https://blog.streamlit.io/common-app-problems-resource-limits/
  2. I haven’t implemented caching concept yet, going to do it anyway. https://docs.streamlit.io/library/advanced-features/caching?ref=blog.streamlit.io
  3. If the previous steps don’t help, going to try out your step 2.
1 Like