Is there any recommended solution for using streamlit session_states while running a part of your code with multiprocessing.Pool?
Of course, the problem is that by using multiprocessing, we run different processes (instead of the main streamlit run …) and the new processes don’t inherit the session_states, resulting in KeyError. I’m curious to know whether there’s a workaround or it’s a limitation that we have to deal with it (i.e. either using session_state or multiprocessing, and not both).
Sharing data between processes can be tricky. There are several ways of doing it documented in the multiprocessing module. The recommended solution would depend on the specifics of your use case.
Thanks for your reply @Goyo , but that’s not what I was looking for.
Indeed, sharing the states between the processes are tricky. But the question is if streamlit supports using of multiprocessing and session_state together?
You should be able to use both multiprocessing and session_state in a streamlit application. The workaround for “new processes don’t inherit the session_states” is sharing the data in some other way. I can’t be more specific without knowing more about your use case.
Create a session state variable to save your work. Run other processes as normal but save the result to session state when done.
import concurrent.futures
from concurrent.futures import ProcessPoolExecutor
import time
import streamlit as st
if 'save' not in st.session_state:
st.session_state.save = []
def task(v):
"""session state does not work here"""
time.sleep(1)
return v * v
if __name__ == '__main__':
num_workers = 2
jobs = [1, 2, 3, 4, 5, 6, 7, 8, 9]
processed_jobs = []
start = st.button('start work')
if start:
with ProcessPoolExecutor(max_workers=num_workers) as executor:
for j in jobs:
pj = executor.submit(task, j)
processed_jobs.append(pj)
for future in concurrent.futures.as_completed(processed_jobs):
try:
res = future.result()
st.write(f'res: {res}')
# Incrementally save the completed task so far.
st.session_state.save.append(res)
except concurrent.futures.process.BrokenProcessPool as ex:
raise Exception(ex)
if len(st.session_state.save):
st.write('#### Completed Jobs')
st.write(f'{st.session_state.save}')
Thank you both @Goyo and @ferdy for the examples. They’re indeed helpful. I think these can be marked as Solution in a general case.
However, after using them in my code, I realized my issue is actually running multiprocessing in multipage streamlit app. Multiprocessing needs to be run within if __name__ == '__main__': block, otherwise it recursively spawns new processes, and hence the error.
My knowledge on multipage apps is limited. What I would like to do is to streamlit run homepage.py, and from homepage go to other pages, which trigger the call to modules containing multiprocessing code.
I don’t see why what you say here would be issues at all.
If that is the case, just do it.
Going to a page triggers the execution of that page’s code. That code can import modules, call functions… the kind of stuff code usually does. Multiprocessing or not, it doesn’t matter.
Again, I don’t think we know enough about your issues to give meaningful advice.
This is a simplified version of my code structure (Please note, I made up the names and functions for sharing purposes.):
# homepage.py
import streamlit as st
def some_function():
# ...
def calculate_encodings():
st.session_state.dataset_path = st.text_input('The path your dataset:', '')
if st.session_state.dataset_path != '':
st.session_state.encodings = some_function()
if __name__ == '__main__':
calculate_encodings()
Everythings is fine here. After completion of calculate_encodings(), I click on another page from the menu (clustering.py). This page triggers a call to a function that uses multiprocessing:
# clustering.py
import streamlit as st
from src.utils import copy_data
if 'encodings' in st.session_state and st.session_state.encodings != None:
copy_data()
# rest of the code ...
And finally in my src/utils.py:
# src/utils.py
from concurrent.futures import ProcessPoolExecutor
def download_file(key):
# downloading the data from remote server
def copy_data():
keys_to_download = ['file1.txt', 'file2.txt', 'file3.txt']
with ProcessPoolExecutor() as executor:
for key in keys_to_download:
executor.submit(download_file, key)
Note that the second page (clustering.py) is not triggered from if __name__ == '__main__': block. Therefore I get the error.
As a workaround, I called copy_data() directly from homepage.py within calculate_encoding(). However, this is not a proper place for calling that function and I was wondering if copy_data() could be called directly from the clustering.py page or not.
I can’t say I totally understand the purpose of your code. I made minimal changes to make it compile and run, and I was unable to cause any errors. I got a warning, though:
WARNING streamlit.runtime.state.session_state_proxy: Session state does not function when running a script without `streamlit run
It seems to happen at some point after calling copy_data() and submiting all tasks to the executor but before the tasks are completed. I don’t know why it happens but it doesn’t seem to cause any problems --the code involved doesn’t depend on session_state nor anything streamlit-related actually.
Thanks for the quick reply. In my case I got the error: “A process in the process pool was terminated abruptly while the future was running or pending multiprocessing” after those warnings. I’m not sure if the error is streamlit issue or not. It could be related to the way I called boto3.
I will mark the answer as solution.
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.