Streamlit Security

We’re trying out streamlit a bit, but our corporate security group has some questions about it that they’d like answers to. Here’s their questions:

  • Is any user-entered data logged automatically by Streamlit? If so, is the data sanitized to avoid inject XSS into the log files?
  • Are all input fields validated to check for format and length on the server side?
  • Were standard libraries also used to validate input fields on the server side?
    • i.e. you didn’t write input validation using custom code, but used a well-vetted community library

In addition to the above specific questions, can you provide some detail on what you do in Streamlit to protect against common application vulnerabilities, such as those relevant ones on the OWASP Top 10 list?

Thanks!

1 Like

Hi @dsw88! Welcome to the forums :wave:

I’d love to get some more detail on the kinds of assurances you’re looking for (see my responses below).

As a preface, though, I think it’s important to realize that since Streamlit is a framework for easily building data apps in Python, by its very nature it comes with the full power of Python. We work hard to design our APIs to make it trivial for developers to Do The Right Thing™ by default, but since you can use any Python you want it’s possible for developers to do Wrong Things as well.

For example, there’s nothing in Streamlit that will stop a developer from doing something like this:

code = st.text_area("Python code")
exec(code) # <-- Bad idea!

So an important distinction here is the exact kind of vulnerabilities are you worried about: those that originate from Streamlit’s own codebase? Or from apps written by Streamlit users?

  • Is any user-entered data logged automatically by Streamlit? If so, is the data sanitized to avoid inject XSS into the log files?

No, Streamlit doesn’t log anything automatically.

  • Are all input fields validated to check for format and length on the server side?
  • Were standard libraries also used to validate input fields on the server side?
  • i.e. you didn’t write input validation using custom code, but used a well-vetted community library

What kinds of checks and validations are you interested in? And what classes of vulnerabilities?

If I understand your question correctly, checks of this sort are very much application-dependent. So it makes sense to leave it up to the user to implement them.

In addition to the above specific questions, can you provide some detail on what you do in Streamlit to protect against common application vulnerabilities, such as those relevant ones on the OWASP Top 10 list?

When I worked at Google, if we wanted to use certain external libraries in production we had submit them through a security review. This is something I would love for companies to do with Streamlit! Let me know if you’re interested. We’d be happy to meet any time and help guide you through our codebase to move this process forward. The more eyes we have on the code, the better :smiley:

In the meantime, I’m happy to give to the following outline of an answer:

  • Injection - As demonstrated above, this is the Streamlit app developer’s responsibility.

  • Broken Authentication - Streamlit doesn’t have built-in authentication.

  • Sensitive data exposure - Similarly, this is the Streamlit app developer’s responsibility.

  • XML External Entities (XXE) - Streamlit doesn’t use XML.

  • Broken Access control - Streamlit doesn’t have built-in access control.

  • Security misconfigurations - Streamlit isn’t a cloud service so we don’t have security configurations.

  • Cross Site Scripting (XSS) - To combat XSS, Streamlit prevents the injection of unsanitized HTML and Javascript. The exception is if you use the unsafe_allow_html parameter. Use that parameter with caution!

  • Insecure Deserialization - This is the Streamlit app developer’s responsibility.

  • Using Components with known vulnerabilities - Streamlit does not deserialize objects from untrusted sources.

  • Insufficient logging and monitoring - Similar to above, this is the Streamlit app developer’s responsibility.

2 Likes

What about streamlit’s “magic” caching ability? If I decorate a function that returns sensitive data with @st.cache, how does streamlit store that data? Does it persist in memory or on the disk in a way that a malicious actor could theoretically gain access to that data? If so, for how long, and is there a way to specify (programatically) when this (part of the) cache should be cleared?

To give you a more concrete example of what I mean. Assume someone managed to gain access to a terminal on my server, but they don’t have read access to a file containing sensitive data. Suppose I then load the file into streamlit and cache the result so that I don’t have to reread it from the disk constantly. Will somebody else potentially be able to access the cached copy of that file from memory/disk somewhere now and read its contents? (Assume that I have protected access to the streamlit app that I am personally running to my own satisfaction.)

Hi @Yasha,

If I decorate a function that returns sensitive data with @st.cache , how does streamlit store that data?

Streamlit has a memory cache and a file cache. The memory cache is the default and we’ll always attempt to read from and write to it. If you add the persist flag like @st.cache(persist=True) then we’ll write to disk after memory, as well as read from disk in case the item is not found in memory.

Does it persist in memory or on the disk in a way that a malicious actor could theoretically gain access to that data?

Memory cache: Memory protection prevents one process from accessing the memory of another. This is a common feature among multiuser operating systems such as Linux, etc. Theoretically if the user had admin access they could use tools such as ptrace to get around these protections.

File cache: We pickle the return value of your @st.cache decorated function and write to the file cache at ~/.streamlit/cache. We have no specific security measures in place so you’d be relying on the user/file permissions of your OS to restrict access to the cache at this time.

If so, for how long, and is there a way to specify (programmatically) when this (part of the) cache should be cleared?

The cache is maintained until it is manually cleared, either through the hamburger menu in the client, the streamlit cli or programmatically via

from streamlit import caching
caching.clear_cache()

It is not possible to clear only a part of the cache at the moment.

FYI the programmatic way posted above is not part of the official API and clearing via the CLI like below is the preferred solution.

streamlit cache clear

@Jonathan_Rhone Thanks for the information!

You might want to consider adding an option to your API that allows developers to specify who has read access to the cached files.

I just wrote a quick streamlit app and verified that it behaves as I expected from your description (which is not secure by default). See below.

import streamlit as st

st.title("Streamlit Security Leak Demo")

st.write("First we'll import some libraries and define some helper functions.")
with st.echo():
    import streamlit as st
    import pandas as pd
    import os, stat, glob, pathlib, pickle

    #Reads the permissions of the specified file
    def getFilePermissions(filename):
        return oct(os.stat(filename)[stat.ST_MODE])[-3:]

    #Returns the most recntly modified file in the directory
    def getLastModified(directory):
        fileList = glob.glob(directory + '/*')
        return max(fileList, key=os.path.getctime)

    #Read a pickled file and return the contents
    def load(filename):
        with open(filename,'rb') as f_in:
            return pickle.load(f_in)


    #The current user's home directory
    home = str(pathlib.Path.home())


st.write("Next, we'll create some fictional sensitive data, save it to a file and set the permissions so that only the owner can read, write and execute the file.")
with st.echo():
    #Create a dataframe with sensitive information
    df = pd.DataFrame({'Username':['Alwyn','Grace','Zenab'],
                       'Password':['12345','@abcd','Horse']})

    #Write it to a file
    df.to_csv('users.csv',index=False)

    #Update the permissions so that only the owner can read the file
    os.chmod('users.csv',stat.S_IRWXU)

    #Verify the file permissions
    permissions = getFilePermissions('users.csv')
    st.write("File permissions of `users.csv`: " + str(permissions))

st.write("Now we'll create a helper function to read in a csv file and return the ith row of the file. We'll cache the result so that we don't need to re-read the entire CSV file every time we want to access the same record. We'll also set persist=True so that the information is written to disk as well as memory so that it persists between streamlit runs.")
with st.echo():
    #Create a function to read in files and cache the contents
    @st.cache(persist=True)
    def read_record(filename,i):
        return dict(pd.read_csv(filename).iloc[i])


st.write("Next we'll use the helper function to read the second row of the CSV file.")
with st.echo():
    #Now read in the file again
    st.write(read_record('users.csv',1))

st.write("Now let's see if we can go and find the cached file with the sensitive information.")
with st.echo():
    #Get the last modified streamlit cache file
    lastModified = getLastModified(home+ '/.streamlit/cache')

    st.write(load(lastModified))

st.write("We found the user's login information! Now, what are the permissions to this file?")
with st.echo():
     permissions = getFilePermissions(lastModified)
     st.write("File permissions of the cached file: " + str(permissions))

st.write("While we've obviously done many things in this app that should never be done from a security standpoint, you can see that a developer using streamlit might accidentally leak sensitive information.")