Multiprocessing leads to frequent app crashes

i am a streamlit newbie. I am using multiprocessing in my app. When without it the app is very robust, but when the multiprocessing is used it often leads to errors reported. I started to specify the number of processes to allocate (e.g. Pool(processes=4) – this helps a bit but still the app is unstable – sometimes is finishes execution and sometimes it returns errors.

Are there any clear guidelines as to how many processes can be used and how to properly allocated resources to avoid app crashes (when the multiprocessing is used.

Thank you!

Do you have a sample code that we can play around. Something that we can reproduce your issue. Just a minimal code.

I am running this function twice in a row. It launches functions which invoke responce from openai. Sometimes everything is fine. But mostly the second run returns in empty list. I am well under the token limit of openai

And my question is what us the maximum amount of workers that will guaranteed work and not invoke some kind of resource limit breach on streamlit

def execute_with_futures(function, input):
    all_responses = {}  # Initialize an empty dictionary to collect all responses

    with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:

        futures = {
            executor.submit(function, list(input[i])): i
            for i, entity in enumerate(input)
        }

        for future in concurrent.futures.as_completed(futures):
            index = futures[future]
            try:
                all_responses[index] = future.result()
            except Exception as e:
                all_responses[index] = f"Error in OpenAI API call: {e}"

    # Convert the dictionary to a list, ensuring the order is preserved
    ordered_responses = [all_responses[i] for i in range(len(input))]

    return ordered_responses

You mentioned crashes. Is there any error message?

@ein_io Without seeing a minimal example of your issues, I know that in order to get multiprocessing to work I had to, for one, ensure my script has a __name__ == '__main__' block. Try following the template here to see if that fixes it.

1 Like

Andrew – thank you so so much – seems to have helped indeed (although per its design the 'name…" block is meant to allow using the script to import functions without executing the would be main code) !!!

No problem, glad it seems to be helping! Agreed on the general point of __name__ == '__main__', but FYI this seems to be a multiprocessing (or related) library restriction per this documentation (see β€œSafe importing of main module”).

Greetings,
I am having an issue a little similar to this one. The community help on other topics related to multiprocessing help me understand a lot about data sharing across processes. Now I have another problem.
I am trying to use a session_state to update a st.bar_chart.
but I keep getting this error 'AttributeError: st.session_state has no attribute "data". Did you forget to initialize it? and the app with spin indefinitely until I shut it down.
I can attest that if I comment out the st.bar_chart line everything under the __main__ block will work just fine in seconds.
my ultimate intend is to use the multiprocessing step to frequently pull the data and pass it to the char.
Not sure of what I am missing here.

any help is greatly appreciated.


def open_file(file_path, out_name):
    
    out_name.df=pd.read_csv(file_path)


if ('data' not in st.session_state):

    st.session_state['data']=pd.DataFrame() #empty data frame
    
st.write(' Bar Char ')
st.bar_chart(st.session_state.data, x='x_label',y='y_label') 
	



if __name__=="__main__":

  
    manager=multiprocessing.Manager()
    out_data=manager.Namespace()
    
    p1=multiprocessing.Process(target=open_file,args=(data_path ,out_data) )
    p1.start()
    p1.join()
    st.session_state.data=out_data.df
    st.write('Multiprocessing New Data:',out_data.df)

  
    

.