What you are experiencing is the difference between web-based programming and “data programming” (however that’s defined)…in this case, Streamlit isn’t saving your file anywhere. The file uploader takes the stream of bytes coming from the widget and saving it in RAM (like any other piece of data). By not writing to disk, you’re removing a step which takes time, so you’re improving performance.
The data lives on until the Streamlit app re-runs from top-to-bottom, which is on each widget interaction. If you need to save the data that was uploaded between runs, then you can cache it so that Streamlit persists it across re-runs:
@felixdasilva yeah - this has been a lot of trial and error for me too… converting streams in to files and such because that’s easier for the documentation i have on with the GIS processing libraries… i wonder if there might be a way to make github file storage a simpler thing? so a stream comes in from some api - caches to github and become persistent for “a while”. idk. i’m sure there are a lot of problems with a solution like that… the nice thing about being in memory is that it is ephemeral… but it can make finding documentation for specific projects harder - especially if you’re a novice like myself
The important point here is that Streamlit doesn’t do it…but that doesn’t mean that Python can’t. If you have a stream in a BytesIO buffer, writing to a file is done by:
with open("out.txt", "wb") as outfile:
# Copy the BytesIO stream to the output file
outfile.write(myio.getbuffer())
When you say saving it in ram, I’m assuming the ram on streamlit’s server? I just want to make sure that any data passed is not kept forever or ideally no data is ‘uploaded’ if that’s even possible when processing ‘local’ data.
RAM is only capable of temporary storage. The data is only available inside the container your app is running in, with each app running in separate containers.
Thanks again for this amazing product. I have a related question.
I have an app running with nginx. I automatically clear the cache after 30 minutes. For added security I would like to suggest to the user to clear the cache when done.
Does your answer “The data is only available inside the container your app is running in, with each app running in separate containers” mean that if a user clears the streamlit cache using the hamburger menu option she is clearing “her” cache or is she also clearing the cache of all other users?
This is in reference to the Streamlit sharing service. Streamlit the open-source library does not itself run as a container process.
The cache is global, relative to the arguments passed into the function call. If you are doing something with sensitive information per user account, you should consider alternative ways of authenticating your application.
Mine is a free service without user authentication. I would like to keep it that way at least for now. However, the information the users upload might indeed by sensitive.
I thought that by using nginx/apache I would get separate sessions and therefore separate caches. Is this not the case? If not, any suggestions on possible approaches?
I’m not an nginx expert to be honest, but my expectation is that it would be the tornado server that’s managing the overall memory, not nginx. I think nginx would just be a reverse proxy?
Just jumping in real quick to clarify how files are stored:
Streamlit manages your uploaded files for you. Files are stored in memory (i.e. RAM, not disk), and they get deleted immediately as soon as they’re not needed anymore.
This means we remove a file from memory when:
The user uploads another file, replacing the original one
The user clears the file uploader
The user closes the browser tab where they uploaded the file
Very helpful. My question is what happens when there is more than one user at the same time, specifically, to the cache? If one user clears ‘his’ cache and exits does another user that was running another session (say under nginx) also have her cache cleared, suffering a dip in performance?
Or say the first user exits without clearing the cache and the second user is a dangerous hacker. Will she be somehow able (we are assuming an app with no user credentials) to recover the sensitive info of her victim?
I know I am dramatizing a bit but just to give it color…,
To clarify: no user has any access to the files uploaded by any other user.
This is true whether the two users are using the app concurrently or not. This is because our file manager data structure is a per-user-session structure.
(Of course, you can always program the ability to share files between user yourself if you want to. For example, by saving the file to disk and showing it to the other user. But I’m assuming you’re not doing that)
One thing that isn’t clear from your response, though, is that at some points you talk about “cache”. If by “cache” you mean “uploaded files”, then what I said above is correct.
But it by “cache” you mean “st.cache”, then the behavior there is different.
st.cache is a global cache keyed by the input parameters to the cached function (among other things. But for the sake of this discussion, let’s simplify!). This means that if you call the same function with the same parameters for two different users, the returned value will be the same for both users. So this is one way where you could inadvertently share information between users.
Another way you could share information between users is by storing information in other shared resources, like disk, databases, global module-level variables, and so on.
My assumptions: Streamlit app, running under nginx. No authentication mechanism (no userId, no login,…). Users upload a sensitive CSV dataset they do not want anybody else to see. The app uses many instances of st.cache to improve performance of different functions. The sensitive user data is always one of the input parameters of the cached functions.
So:
If a user A uploads a dataset, this dataset cannot be accessed by user B, even if user B is a hacker. Great.
The global st.cache content that “belongs” to user A is not accessible to user B (since user B cannot access user A’ dataset, and therefore cannot call the function with the same parameters) Great.
Last doubt.
User A and user B are working, at the same time, with different datasets, on the streamlit app. All is good and they cannot see the other user’s file, whether they are hackers or not.
At one point, user A, for whatever reason, clears the cache, deleting, if I understand correctly, also the info in the global st.cache tied to user B.
User B will experience a loss in performance until the cache rebuilds itself?
If this is true (hope not) once you have enough users and there statistically is always one clearing the cache, the cache will more or less always be empty? Any ideas on how to avoid it, assuming this is true?
are there additional cases in place for streamlit cloud/sharing apps that would cause a file to be removed from RAM?
I have a private app that 12 or so users login (via google login) to use and am experiencing the following:
after file upload, (and analysis occurs on the dataset): if the user leaves the tab open but does not interact with the webpage for some time, the file clears out at some point (like the app re-ran from the beginning --even though they did not click any radio buttons or widgets or anything that would case a re-run of the code from top to bottom)
seemingly at random, users will upload a file, analysis will complete, and then the app reverts back to the start where they have to re-upload the file.
Is the RAM a shared resource between all users of the streamlit cloud app? they all login separately with their google credentials to use the app… I’m wondering if other users uploading files is causing their uploaded file to get removed from RAM therefore causing them to have to start from the beginning/re-upload the file? file size uploaded is a .csv, only 3-5MB max.
also worth noting:
I have not implemented @st.cache
my concern being that multiple users are using the app at the same time
would this cache be specific to their cloud instance/file upload (since they’re logging in with google account before uploading their files?). I do not want user A’s file to be cached and seen when user B logs in with their google account to use the app
the other solution I haven’t explored would be implementing session_state. same concerns as st.cache above.
are either of these (st.cache or st.session_state) the correct next steps for me to resolve the behavior my users are experiencing?
I have this issue on my app https://lambard-ml-team-madgui.streamlit.app/
When I upload a file there as User 1 I made it so it unlock the other page of the app (not accessible without uploading a file) but when I open a new page as User 2 I have access to the other pages and I can even see the graph and data of User 1.
Also I use a lot of session_state but I think they are the same for user 1 and user 2. It is a big problem, it means that I can’t have 2 users at the same time.
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.