Where does the data go when using file_uploader? When does it get deleted?

The code is on GitHub

The problem is that the st.session_state is apparently global and so, share with all the user that connect at the same time. It is a big problem for me and I can’t figure out how to change that.

Try upgrading streamlit to 1.23.1.

I downloaded Exemple_data.csv from your repository, uploaded it to your application and got an error in section 2 - Selection of your features and targets for the project:

streamlit.errors.StreamlitAPIException: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app).
Traceback:
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
File "/app/madgui/MADGUI/MADGUI.py", line 175, in <module>
    feature = st.multiselect("Features - Unselected the one you don't need :", st.session_state['data_file'].columns, default = st.session_state['feature_selected'])
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/runtime/metrics_util.py", line 332, in wrapped_func
    result = non_optional_func(*args, **kwargs)
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/elements/multiselect.py", line 251, in multiselect
    return self._multiselect(
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/elements/multiselect.py", line 291, in _multiselect
    indices = _check_and_convert_to_indices(opt, default)
File "/home/appuser/venv/lib/python3.9/site-packages/streamlit/elements/multiselect.py", line 98, in _check_and_convert_to_indices
    raise StreamlitAPIException(

And after that I could reproduce the issue, but obviously something was wrong before that.

Then I deployed your app with Streamlit 1.23.1 and did the same, I found no errors and no leaking of data across sessions.

Hello,

The error message that you recieve come from the problem that I am talking about. We recieve this error message because it means that there is already different data in the cache (in st.session_state[‘data_file’]) that are not clear.

I tried your suggestion with streamlit 1.23.1 but I doesn’t change the problem. You can see clearly the problem if you open one app, charge the data, select the parameter, submit then go to the “Prediction” page. Then open a new app and on this new app click the “Reset all” button in the sidebar. You will see that it will also delete the data in the “prediction page” of the first app.

Here is an exemple of the problem I am talking about, problem that occurs only in streamlit cloud:

import streamlit as st
if 'proof' not in st.session_state:
	st.session_state["proof"]="This text is the initialization of st.session_state['proof']"
	st.write(st.session_state["proof"])
	st.stop()
else:
	st.session_state["proof"]="If you see that on a new page there is a problem"
	st.write(st.session_state["proof"])
reset_button=st.button("Reset session_state")
if reset_button:
	st.session_state={}
	st.experimental_rerun()

If you deploy the code above, when you will open a new page of the app you are suppose to receive only the message “This text is the initialization of st.session_state[‘proof’]” but when you open a second new page you will see that st.session_state[“proof”] already exist so you will receive the message “If you see that on a new page there is a problem” when you are not supposed too.

This is my deployment if your MADGUI where I cannot replicate the issue. Let me know if you can.

I’ll try your new snippet when I find the time.

Thank you for your time.

I still find the same problem with your deployment of MADGUI, after I tested your deployement of MADGUI, you just have to go on it again without uploading any file, you will see data on the prediction page that you are not suppose to see since you didn’t upload any data yet.
Another way to reproduce it:

  • launch one page of MADGUI (let say as USER 1)
  • use the data on GitHub
  • complete the selection of features and targets
  • it will unlock the others pages (Prediction and Bayesian)
  • then launch a new MADGUI page (like if you are another user, USER 2)
  • click “Reset all” in the sidebar
  • go back to the first MADGUI (in USER 1 page) and you will see that your selection of features and targets and the access to the others pages are reset too when it should not.

In my perspective it seems that in streamlit cloud there is 1 st.session_state by application deployed and not 1 by user that launch the app so I can’t have multiple user at the same time using st.session_state.
I don’t have this problem in local when I open multiple page at the same time because they all have their one address (localhost:8501, localhost:8502,…), there is no interaction between them.

Oh, of course it is. It is just the items that are session-sppecific. But st.session_state itself is global.

So don’t do st.session_state = {}, use st.session_state.clear() instead. Try it in my deployment (for limited time).

I was unable to replicate the issue with your other example, it works as expected as far as I can tell.

It is very strange because I have the issue even with your deployment. At the beginning, indeed, all the page opened with the correct message but after some time and some refresh it is not the case anymore. When I open new page now with your deployment I receive the second message.
To not go to far out of the topic of this page I create a specific issue for this on a new topic.
Thank you for your help so far.

I identified the issue and you gave me the solution to it.
The problem came from the reset button:
st.session_state={}
when you click on it it may destroy the initial format of session_state and then you have the issue that I was talking about.

I was unable to replicate the issue with your other example, it works as expected as far as I can tell.

In your deployment until we click on the reset button there is no problem but when we click on it there is the issue.
As you suggested, by changing that with st.session_state.clear() I don’t have this issue anymore.

Thank you for your help.
I will post the answer too in the new topic that I opened because I think I will not be the only one to try st.session_state={}

Damn, for some reason I didn’t even realize there was a reset button.

Hi Thiago,

I’m building a PDF merger using streamlit in my local machine. I’m using st.file_uploader() to upload all pdf files.
Once they’re uploaded, i want to extract one string inside pdf.

According to your reply, does it mean we cannot open the pdf like the following code:

doc = fitz.open(pdf_full_path?)

Because i get ‘fitz.fitz.FileNotFoundError: no such file: ‘1FA02252303353KLSLCERT-028.pdf’’