Hi,
Thank you for this wonderful simple tool. I have started it using recently and I have bumped into an issue where streamlit is throwing an error when working with joblib parallel and delayed. Looking for a solution. Thank you in advance.
Hi,
Thank you for this wonderful simple tool. I have started it using recently and I have bumped into an issue where streamlit is throwing an error when working with joblib parallel and delayed. Looking for a solution. Thank you in advance.
Hi @sarath, welcome to the Streamlit community!
Can you provide more detail about what code you’re actually running (a link to the repo would be great), whether this is a local issue or an issue on Streamlit Cloud (or some other deployment), and any other relevant information?
Best,
Randy
Hi @sarath,
instead of joblib I would recommend using dask. Check out this blog post to see some usage examples.
Best regards,
Nico
Hi @randyzwitch
Within my knowledge, it is a local issue, and given below is the mini form of code that I use
import streamlit as st
from joblib import Parallel, delayed, cpu_count
import time
from stqdm import stqdm
@st.cache
def double(x):
time.sleep(0.5)
return 2 * x
def inc(x):
time.sleep(0.5)
return double(x)
arguments = range(1, 5)
st.header(“Sequential”)
last_time = time.time()
results = [inc(x) for x in stqdm(arguments)]
st.write(results)
current_time = time.time()
st.write(current_time - last_time, “seconds”)
last_time = current_time
st.header(“Parallel”)
last_time = time.time()
njobs = cpu_count()-1
results = Parallel(n_jobs=njobs)(delayed(inc)(x) for x in stqdm(arguments))
st.write(results)
current_time = time.time()
st.write(current_time - last_time, “seconds”)
last_time = current_time
Thanks,
Sarath
Thank you @nflaig, this is a great package, but dask.compute() is taking the same time as the sequential method. can you point me to some other source(s)? or am I doing something wrong?
@sarath the example here is as simple as it can be but I only used dask for asynchronous operations so far which take a long time e.g. downloading files and there you find a huge performance increase becasue it does all the downloads in parallel instead of one after anohter. You just call the initial function with dask.delayed
and which return a object, those objects you store in an array and at the end for example after the for-loop you run dask.compute
on the array which contains the delayed objects.
@nflaig, dask.delayed is working as expected, but the dask.compute is taking long time and it may be not useful in my case. Many thanks for pointing me to Dask.
if not direct, any work-around solution will be much appreciated.
Thanks,
Sarath
Hi @sarath
When using @st.cache
, you receive the following error:
PicklingError: Could not pickle the task to send it to the workers.
The solution is to decorate double(x)
with @st.experimental_memo
instead:
import streamlit as st
from joblib import Parallel, delayed, cpu_count
import time
from stqdm import stqdm
@st.experimental_memo
def double(x):
time.sleep(0.5)
return 2 * x
def inc(x):
time.sleep(0.5)
return double(x)
arguments = range(1, 5)
st.header("Sequential")
last_time = time.time()
results = [inc(x) for x in stqdm(arguments)]
st.write(results)
current_time = time.time()
st.write(current_time - last_time, "seconds")
last_time = current_time
st.header("Parallel")
last_time = time.time()
njobs = cpu_count()-1
results = Parallel(n_jobs=njobs)(delayed(inc)(x) for x in stqdm(arguments))
st.write(results)
current_time = time.time()
st.write(current_time - last_time, "seconds")
last_time = current_time
Happy Streamlit-ing!
Snehan
hi, thanks for trying to solve the error, but I have bumped into the following error while using your solution.
error:
2022-02-14 13:53:07.301 Thread ‘Thread-8’: missing ScriptRunContext
2022-02-14 13:53:07.302 Thread ‘Thread-8’: missing ScriptRunContext
2022-02-14 13:53:07.303 exception calling callback for <Future at 0x19cbbc6f400 state=finished returned list>
Traceback (most recent call last):
** File “d:\anaconda3\lib\site-packages\joblib\parallel.py”, line 820, in dispatch_one_batch**
** tasks = self._ready_batches.get(block=False)**
** File “d:\anaconda3\lib\queue.py”, line 167, in get**
** raise Empty**
_queue.Empty
What version of Streamlit are you using? And could you share your code, the exact error, and the traceback? I take it you’re not running into the issue when using only the code from above.
It would help to have a minimal, reproducible example.
This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.