This is a working example, to reproduce just click the checkbox while the progress bar is still filling up:
import streamlit as st
import plyvel
db = plyvel.DB('/tmp/testdb/', create_if_missing=True)
db.put(b'key', b'value')
st.checkbox('make it crash')
num = 100000
p = st.progress(0)
for x in range(num):
a = db.get(b'key')
p.progress(int(x/num * 100))
I investigated a bit the problem, I fear that the problem is that LevelDB doesn’t support multiprocessing, and probably streamlit uses processes to manage multiple users and interactive widgets.
Meanwhile I’m building a wrapper of plyvel that can either access the db directly or through some REST API of a local backend server, to avoid the lock problems.
Let me know if there are more efficient solutions!
Seems like the current process isn’t properly killed when the widget interaction starts a new process
I’ve resolved the issue with the use of st.cache and hash_funcs
import streamlit as st
import plyvel
import time
@st.cache(hash_funcs={plyvel._plyvel.DB: id})
def get_db():
return plyvel.DB('/tmp/testdb/', create_if_missing=True)
db = get_db()
db.put(b'key', b'value')
st.checkbox('make it not crash :)')
num = 20
p = st.progress(0)
for x in range(num+1):
time.sleep(.1)
a = db.get(b'key')
p.progress(int(x/num * 100))
If you encounter any further issues please reach out!
Hi @Jonathan_Rhone thank you very much!
I confirm that the code works as expected!
Although I’m not sure I fully understood how the cache mechanism works. I though that since the function does not have any parameters it would be called only once, the first time.
Is it doing some internal check to see if the returned object is mutated, and in case return a new object?
The same code works even with my wrapper, if I set hash_funcs={LevelDB: id} or allow_output_mutation=True
I though that since the function does not have any parameters it would be called only once, the first time
Sorry I’m not sure I understand what you mean here. Are you referring to the get_db function? It will be called on the first run of the report, after which we’ll return the plyvel.DB() connection from the cache when it’s called.
Is it doing some internal check to see if the returned object is mutated, and in case return a new object?
Previous versions of Streamlit did this but as of version v0.53.0 we still do this internal check but we display a warning and return the cached version of the object instead of re-running the function and returning a new object.
The same code works even with my wrapper, if I set hash_funcs={LevelDB: id} or allow_output_mutation=True
I believe we disable hashing of the output if you set allow_output_mutation to True, which in this case negates the need to use hash_funcs to allow for the hashing of the plyvel.DB instance. However if you wanted to pass this db instance to another cached function as an input parameter, or to use the instance in the body of a cached function (not the return value), you would need to use hash_funcs as allow_output_mutation would not help in those scenarios. I would stick with hash_funcs either way as allow_output_mutation would be used as a hack rather than for its primary use case
I think I have a similar problem, I am trying to use multiprocessing with streamlit. I’ve had no trouble in the past using multiprocessing and streamlit together, but I can’t get it to work when I have a hashed database connection in my app.
Running my app serially runs fine. Running multiprocessing without the database connection works fine.
Also there are no database connection inside of multiprocessing, all of it is outside of the pool.
Database Connection:
@st.cache(hash_funcs={Connection: id})
def get_connection():
"""
Put the connection in cache to reuse if path does not change between Streamlit reruns.
NB : https://stackoverflow.com/questions/48218065/programmingerror-sqlite-objects-created-in-a-thread-can-only-be-used-in-that-sa
"""
return sqlite3.connect("./database/solar_projects.db", check_same_thread=False)