Streamlit is not working with joblib parallel and delayed

sarath · February 9, 2022, 4:45pm

Hi,

Thank you for this wonderful simple tool. I have started it using recently and I have bumped into an issue where streamlit is throwing an error when working with joblib parallel and delayed. Looking for a solution. Thank you in advance.

randyzwitch · February 10, 2022, 5:32pm

Hi @sarath, welcome to the Streamlit community!

Can you provide more detail about what code you’re actually running (a link to the repo would be great), whether this is a local issue or an issue on Streamlit Cloud (or some other deployment), and any other relevant information?

Best,
Randy

nflaig · February 10, 2022, 5:52pm

Hi @sarath,

instead of joblib I would recommend using dask. Check out this blog post to see some usage examples.

Best regards,
Nico

sarath · February 11, 2022, 8:46am

Hi @randyzwitch

Within my knowledge, it is a local issue, and given below is the mini form of code that I use

import streamlit as st
from joblib import Parallel, delayed, cpu_count
import time
from stqdm import stqdm

@st.cache
def double(x):
time.sleep(0.5)
return 2 * x

def inc(x):
time.sleep(0.5)
return double(x)

arguments = range(1, 5)

st.header(“Sequential”)
last_time = time.time()
results = [inc(x) for x in stqdm(arguments)]
st.write(results)
current_time = time.time()
st.write(current_time - last_time, “seconds”)
last_time = current_time

st.header(“Parallel”)
last_time = time.time()
njobs = cpu_count()-1
results = Parallel(n_jobs=njobs)(delayed(inc)(x) for x in stqdm(arguments))
st.write(results)
current_time = time.time()
st.write(current_time - last_time, “seconds”)
last_time = current_time

Thanks,
Sarath

sarath · February 11, 2022, 8:50am

Thank you @nflaig, this is a great package, but dask.compute() is taking the same time as the sequential method. can you point me to some other source(s)? or am I doing something wrong?

nflaig · February 11, 2022, 4:45pm

@sarath the example here is as simple as it can be but I only used dask for asynchronous operations so far which take a long time e.g. downloading files and there you find a huge performance increase becasue it does all the downloads in parallel instead of one after anohter. You just call the initial function with dask.delayed and which return a object, those objects you store in an array and at the end for example after the for-loop you run dask.compute on the array which contains the delayed objects.

sarath · February 14, 2022, 3:22am

@nflaig, dask.delayed is working as expected, but the dask.compute is taking long time and it may be not useful in my case. Many thanks for pointing me to Dask.

sarath · February 14, 2022, 3:27am

if not direct, any work-around solution will be much appreciated.

Thanks,
Sarath

snehankekre · February 14, 2022, 7:35am

Hi @sarath

When using @st.cache, you receive the following error:

PicklingError: Could not pickle the task to send it to the workers.

Solution

The solution is to decorate double(x) with @st.experimental_memo instead:

import streamlit as st
from joblib import Parallel, delayed, cpu_count
import time
from stqdm import stqdm

@st.experimental_memo
def double(x):
    time.sleep(0.5)
    return 2 * x

def inc(x):
    time.sleep(0.5)
    return double(x)

arguments = range(1, 5)

st.header("Sequential")
last_time = time.time()
results = [inc(x) for x in stqdm(arguments)]
st.write(results)
current_time = time.time()
st.write(current_time - last_time, "seconds")
last_time = current_time

st.header("Parallel")
last_time = time.time()
njobs = cpu_count()-1
results = Parallel(n_jobs=njobs)(delayed(inc)(x) for x in stqdm(arguments))
st.write(results)
current_time = time.time()
st.write(current_time - last_time, "seconds")
last_time = current_time

Output

joblib

Happy Streamlit-ing!
Snehan

sarath · February 14, 2022, 8:26am

hi, thanks for trying to solve the error, but I have bumped into the following error while using your solution.

error:
2022-02-14 13:53:07.301 Thread ‘Thread-8’: missing ScriptRunContext
2022-02-14 13:53:07.302 Thread ‘Thread-8’: missing ScriptRunContext
2022-02-14 13:53:07.303 exception calling callback for <Future at 0x19cbbc6f400 state=finished returned list>
Traceback (most recent call last):
** File “d:\anaconda3\lib\site-packages\joblib\parallel.py”, line 820, in dispatch_one_batch**
** tasks = self._ready_batches.get(block=False)**
** File “d:\anaconda3\lib\queue.py”, line 167, in get**
** raise Empty**
_queue.Empty

snehankekre · February 14, 2022, 8:37am

What version of Streamlit are you using? And could you share your code, the exact error, and the traceback? I take it you’re not running into the issue when using only the code from above.

It would help to have a minimal, reproducible example.

system · February 14, 2023, 8:37am

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Interaction between joblib and streamlit.cache Using Streamlit cache	2	2080	November 19, 2021
Update st.progress with joblib (multiprocessing tasks) Using Streamlit	2	2696	December 22, 2023
Time.sleep() weirdness Using Streamlit	4	4704	November 19, 2021
How do we display the prints inside children processes in a joblib parallel job onto the streamlit output? Using Streamlit stwrite	3	1048	December 27, 2023
Got new errors please help Community Cloud streamlit-cloud	8	470	March 13, 2024

Streamlit is not working with joblib parallel and delayed

Solution

Output

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies