Streamlit is not working with joblib parallel and delayed

Hi,

Thank you for this wonderful simple tool. I have started it using recently and I have bumped into an issue where streamlit is throwing an error when working with joblib parallel and delayed. Looking for a solution. Thank you in advance.

Hi @sarath, welcome to the Streamlit community!

Can you provide more detail about what code you’re actually running (a link to the repo would be great), whether this is a local issue or an issue on Streamlit Cloud (or some other deployment), and any other relevant information?

Best,
Randy

Hi @sarath,

instead of joblib I would recommend using dask. Check out this blog post to see some usage examples.

Best regards,
Nico

Hi @randyzwitch

Within my knowledge, it is a local issue, and given below is the mini form of code that I use

import streamlit as st
from joblib import Parallel, delayed, cpu_count
import time
from stqdm import stqdm

@st.cache
def double(x):
time.sleep(0.5)
return 2 * x

def inc(x):
time.sleep(0.5)
return double(x)

arguments = range(1, 5)

st.header(“Sequential”)
last_time = time.time()
results = [inc(x) for x in stqdm(arguments)]
st.write(results)
current_time = time.time()
st.write(current_time - last_time, “seconds”)
last_time = current_time

st.header(“Parallel”)
last_time = time.time()
njobs = cpu_count()-1
results = Parallel(n_jobs=njobs)(delayed(inc)(x) for x in stqdm(arguments))
st.write(results)
current_time = time.time()
st.write(current_time - last_time, “seconds”)
last_time = current_time

Thanks,
Sarath

Thank you @nflaig, this is a great package, but dask.compute() is taking the same time as the sequential method. can you point me to some other source(s)? or am I doing something wrong?

@sarath the example here is as simple as it can be but I only used dask for asynchronous operations so far which take a long time e.g. downloading files and there you find a huge performance increase becasue it does all the downloads in parallel instead of one after anohter. You just call the initial function with dask.delayed and which return a object, those objects you store in an array and at the end for example after the for-loop you run dask.compute on the array which contains the delayed objects.

@nflaig, dask.delayed is working as expected, but the dask.compute is taking long time and it may be not useful in my case. Many thanks for pointing me to Dask.

if not direct, any work-around solution will be much appreciated.

Thanks,
Sarath

Hi @sarath :wave:

When using @st.cache, you receive the following error:

PicklingError: Could not pickle the task to send it to the workers.

Solution

The solution is to decorate double(x) with @st.experimental_memo instead:

import streamlit as st
from joblib import Parallel, delayed, cpu_count
import time
from stqdm import stqdm

@st.experimental_memo
def double(x):
    time.sleep(0.5)
    return 2 * x

def inc(x):
    time.sleep(0.5)
    return double(x)

arguments = range(1, 5)

st.header("Sequential")
last_time = time.time()
results = [inc(x) for x in stqdm(arguments)]
st.write(results)
current_time = time.time()
st.write(current_time - last_time, "seconds")
last_time = current_time

st.header("Parallel")
last_time = time.time()
njobs = cpu_count()-1
results = Parallel(n_jobs=njobs)(delayed(inc)(x) for x in stqdm(arguments))
st.write(results)
current_time = time.time()
st.write(current_time - last_time, "seconds")
last_time = current_time

Output

joblib

Happy Streamlit-ing! :balloon:
Snehan

hi, thanks for trying to solve the error, but I have bumped into the following error while using your solution.

error:
2022-02-14 13:53:07.301 Thread ‘Thread-8’: missing ScriptRunContext
2022-02-14 13:53:07.302 Thread ‘Thread-8’: missing ScriptRunContext
2022-02-14 13:53:07.303 exception calling callback for <Future at 0x19cbbc6f400 state=finished returned list>
Traceback (most recent call last):
** File “d:\anaconda3\lib\site-packages\joblib\parallel.py”, line 820, in dispatch_one_batch**
** tasks = self._ready_batches.get(block=False)**
** File “d:\anaconda3\lib\queue.py”, line 167, in get**
** raise Empty**
_queue.Empty

What version of Streamlit are you using? And could you share your code, the exact error, and the traceback? I take it you’re not running into the issue when using only the code from above.

It would help to have a minimal, reproducible example.

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.