How to retain data in streamlit app built for multiple users and also avoid cross talk where user can upload files, process them and upload again?

  1. I am building a streamlit app that will be used by multiple users and I cant allow cross talk between users so I am not using any caching anywhere.
  2. App starts by uploading pdf file and parsing them into dataframe which gets saved into dictionary for keeping variable dataframe names.

Issue is when 1st file is already uploaded, processed into a dataframe, saved in dictionary and then if user uploads another file then it completely runs again and overwrites everything and I loose previous data.

I am not able to retain the 1st processed dataframe or dictionary. I cant cache the data as to not allow cross talk. So how should I over come this issue ?

Below is the sample structure that I am following:

# upload pdf file (uploading 1 file at a time currently)
st.session_state.uploaded_file = st.file_uploader('Choose your **.pdf** file to upload', type="pdf")

if st.session_state.uploaded_file:
    
    # starting counter to keep track of file counter
    st.session_state.counter = 1
    
    # Some function to Parse uploaded pdf file that I am importing from another file
    parsing_function()
    st.success('Parsing is Complete')

    # saving dataframe after processing
    df = some_data_processing()

    # Allowing user to edit dataframe and save it
    st.session_state.user_df = st.data_editor(st.session_state.df, num_rows="dynamic", key = st.session_state.counter)
    
    # creating blank dictionary
    st.session_state.dict_of_df = {}

    # can also use filename instead of counter to keep distinct names
    st.session_state.key_name = 'user_df_' + str(st.session_state.counter)

    # saving this edited user_df into dictionary
    st.session_state.dict_of_df[st.session_state.key_name] = st.session_state.user_df.copy(deep=True)


    # creating list of dataframes that starts with user_df* 
    # st.session_state.users_df_list = [key for key in st.session_state.keys() if key.startswith('user_df')]
    st.session_state.users_df_list = list(st.session_state.dict_of_df.keys())

    st.session_state.selected_df_list = st.multiselect("User Created Dataframes",
                                                    st.session_state.users_df_list,
                                                    default= st.session_state.users_df_list)

    st.session_state.counter +=1

This is how it looks:

Now after all of this if I upload a new pdf file everything gets overwritten. What I want to retain user_df and keep adding to st.session_state.users_df_list and that is what I am not able to think of.

Appreciate any help that can guide me on how to retain above created user_dfs when upload subsequent files.

I have also uploaded this Query in Stackoverflow post

Hi @johnsnow09,

The key is to make sure you don’t re-initialize your session state variables every time the app runs.

So, instead of just doing st.session_state.counter = 1 and st.session_state.dict_of_df = {} (which will reset these entries every time they run), you can put these lines early in your app:

# starting counter to keep track of file counter
if "counter" not in st.session_state:
    st.session_state.counter = 1

# creating blank dictionary
if "dict_of_df" not in st.session_state:
    st.session_state.dict_of_df = {}

You also probably want to add a submit button before actually adding a new dataset to your dict_of_df entries, because if you don’t do that, then every time you interact with any widget, it will add a new entry.

Something like this works well

if st.button("Submit"):
    # saving this edited user_df into dictionary
    st.session_state.dict_of_df[st.session_state.key_name] = (
        st.session_state.user_df.copy(deep=True)
    )

    # creating list of dataframes that starts with user_df*
    # st.session_state.users_df_list = [key for key in st.session_state.keys() if key.startswith('user_df')]
    st.session_state.users_df_list = list(st.session_state.dict_of_df.keys())

    st.session_state.counter += 1
  1. This worked really well. Exactly the way I wanted. Thanks alot for helping me out.

  2. There is another thing - after this I am trying to combine(concat) all the dataframes in the dictionary which I am able to do it by below code but while doing that it again runs all the script so it that takes alot of time to do that. Is there any way to avoid running all the script again for below code ??

    if st.button('Combine All Dataframes'):        
        st.session_state.df_final = pd.concat(
           [st.session_state.dict_of_df[name] for name in st.session_state.selected_df_list],
        ignore_index=True)

        st.write(st.session_state.df_final)

Or the only alternatives is to combine them without using button or put them in another page ?

The best way to isolate one part of the app so that it can run without rerunning the rest of the app is to use st.fragment - Streamlit Docs – you can put this code into a function, and decorate it with @st.fragment, and it will run without rerunning everything.

1 Like

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.