Session states DO NOT clear on browser refresh and leak through sessions

Hey, everyone!

I am getting a very strange bug with session states in my chat bot app. I store all messages as st.session_state.messages and when I refresh the page, chat history persists. Even more strangely, it become visible to the other user - so sometimes I see other people unfinished conversation.

This is very untypical to my previous experience with Streamlit and everything written on the Forum where people usually complain that it does not keep session states on browser refresh (as it should not).

There is no cached functions and the app is pretty simple, and I would say that the code is pretty simple and straightforward, so I am deeply confused.

Browser cookies clearing does not have any impact, as well as changing streamlit version to 1.31 which I have used for much longer and never experienced anything similar. I debug this bt refreshing the browser and by opening a new browser session in incognito window where I see the previous chat history right upon opening. Different browsers are checked as well.

The worst thing is that this lead to cross-users chat history leaking. So sometimes I can see other people dialogues which is unacceptable. However, no external database communication is used so I do not even understand where do this information is stored.

I also have a feeling that this behaviour appered just recently, but do not have any proof here.

code used to init session state:

if "messages" not in st.session_state:
    st.session_state.messages = initial_messages

code to show messages:

with st.container():    
    for message in st.session_state.messages:
        if message["role"] != "system":
            avatar = avatars.get(message["role"], "👤")
            #time.sleep(1)
            with st.chat_message(avatar):
                st.markdown(message["content"])

and functions to append new messages:

  def add_assistant_message(self, message: str):
      if message and message != "":
          st.session_state.messages.append({"role": "assistant", "content": message})
  def add_user_message(self, message: str):
      if message and message != "":
          st.session_state.messages.append({"role": "user", "content": message})

Strewmlit version = 1.33
Python = 3.11
App url in streamlit cloud: https://betterway.streamlit.app/

Any help is deeply appreciated, plese share if you have met anything similar and what helped you to solve it.

This can happen if you assign a value to st.session_state (I have seen st.session_state = {}) or you assign a shared object to an item (like the return value of a cached function or a global variable defined in a module other than the main script).

Thanks for replying! I do assign a value of a global variable imported from the external module to a st.session_state.messages upon initialization:

from prompts initial_messages

if "messages" not in st.session_state or st.session_state.messages == []:
    st.session_state.messages = initial_messages

Could this cause such a behaviour? What are the rules to follow here? Did not find anything relevant in session states documentation.

Additionally, I did suspect use of assisting objects with session_states: one is a client connected to an LLM provider and another to wrap dialogue flow functions such as add_user_message OR add_assistant_message. None of them are cached and none store data outside st.session_state.messages.

if ‘llm’ not in st.session_state:
st.session_state.llm = LLM(prompt)

if ‘dialog’ not in st.session_state:
st.session_state.dialog = Dialog(st.session_state.llm, initial_messages)

However, excluding it from the code did not solve the issue.

Just follow the good old rules of python. Modules are imported only once, thus initial_messages is always the same object and st.session_state.messages is always that same object.

You want to make sure st.session_state.messages is a different object for each session. Maybe by assigning a copy of initial_messages

# assumes initial_messages is a list
st.session_state.messages = initial_messages.copy()

or by making initial_messages a function that creates and return a new object each time it is called.

# in the prompts module
def initial_messages():
    return [{"role": "assistant", "content": "What's up?"}]

# in your script
st.session_state.messages = initial_messages()

Thanks a lot, Goyo.

Previously I have imported initial_messages as a constant list and have never attempted to redact it, so assumed it remains constant, which appeared to be not the case. Replacing import initial_messages by the function calling initial_messages() has solved the problem.

I am not sure I understand the reason why not doing so leads to cross-session data leaks but will be more accurate with that further on. Wondering if there is an established way of intentional cross-session data transfer to preserve some variables on the browser refresh events?

There are no constants in Python. There are mutable and inmutable objects. Lists are mutable (their state can change).

Because every session is appending its data to the same object (the one referenced by initial_messages).

I don’t understand that question. The idiomatic way to share data across sessions in streamlit is probably caching, but there are other ways as you accidentally discover. But again, that is just how python works, not something streamlit-specific.

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.