Streamlit crashes when using Leveldb

luca · January 14, 2020, 4:46pm

Hello everyone,

I’m in need to use LevelDB with streamlit, using the plyvel wrapper.

LevelDB supports multithread access but it does not support multiprocessing.
It is extremely fast and quite common to store large datasets.

Is there a way to make it work inside streamlit?

I tried to use the cache mechanism, but it doesn’t change the result:

@st.cache
def get_db(dataset_root):
    db = LevelDB.get_instance(dataset_root)
    return db

Jonathan_Rhone · January 15, 2020, 2:26am

Hi @luca,

I’m trying to recreate your issue but I’m unable.

Could you provide a full code example that I can run?

The following works OK for me

import streamlit as st
import plyvel

db = plyvel.DB('/tmp/testdb/', create_if_missing=True)

db.put(b'key', b'value')

st.write(db.get(b'key'))

luca · January 16, 2020, 8:42am

Hi @Jonathan_Rhone,

Thank you for the reply!

The problem appears when one of the two happens:

There are interactive widgets
There are multiple users

This is a working example, to reproduce just click the checkbox while the progress bar is still filling up:

import streamlit as st
import plyvel

db = plyvel.DB('/tmp/testdb/', create_if_missing=True)

db.put(b'key', b'value')

st.checkbox('make it crash')

num = 100000
p = st.progress(0)
for x in range(num):
    a = db.get(b'key')
    p.progress(int(x/num * 100))

I investigated a bit the problem, I fear that the problem is that LevelDB doesn’t support multiprocessing, and probably streamlit uses processes to manage multiple users and interactive widgets.

Meanwhile I’m building a wrapper of plyvel that can either access the db directly or through some REST API of a local backend server, to avoid the lock problems.

Let me know if there are more efficient solutions!

Seems like the current process isn’t properly killed when the widget interaction starts a new process

Jonathan_Rhone · January 16, 2020, 5:20pm

Hey @luca,

Thanks for the snippet!

I’ve resolved the issue with the use of st.cache and hash_funcs

import streamlit as st
import plyvel
import time

@st.cache(hash_funcs={plyvel._plyvel.DB: id})
def get_db():
    return plyvel.DB('/tmp/testdb/', create_if_missing=True)

db = get_db()

db.put(b'key', b'value')

st.checkbox('make it not crash :)')

num = 20

p = st.progress(0)

for x in range(num+1):
    time.sleep(.1)
    a = db.get(b'key')
    p.progress(int(x/num * 100))

If you encounter any further issues please reach out!

luca · January 16, 2020, 8:35pm

Hi @Jonathan_Rhone thank you very much!
I confirm that the code works as expected!

Although I’m not sure I fully understood how the cache mechanism works. I though that since the function does not have any parameters it would be called only once, the first time.

Is it doing some internal check to see if the returned object is mutated, and in case return a new object?

The same code works even with my wrapper, if I set hash_funcs={LevelDB: id} or allow_output_mutation=True

Jonathan_Rhone · January 17, 2020, 11:01pm

Hi @luca,

I though that since the function does not have any parameters it would be called only once, the first time

Sorry I’m not sure I understand what you mean here. Are you referring to the get_db function? It will be called on the first run of the report, after which we’ll return the plyvel.DB() connection from the cache when it’s called.

Is it doing some internal check to see if the returned object is mutated, and in case return a new object?

Previous versions of Streamlit did this but as of version v0.53.0 we still do this internal check but we display a warning and return the cached version of the object instead of re-running the function and returning a new object.

https://github.com/streamlit/streamlit/blob/0.53.0/lib/streamlit/caching.py#L286

The same code works even with my wrapper, if I set hash_funcs={LevelDB: id} or allow_output_mutation=True

I believe we disable hashing of the output if you set allow_output_mutation to True, which in this case negates the need to use hash_funcs to allow for the hashing of the plyvel.DB instance. However if you wanted to pass this db instance to another cached function as an input parameter, or to use the instance in the body of a cached function (not the return value), you would need to use hash_funcs as allow_output_mutation would not help in those scenarios. I would stick with hash_funcs either way as allow_output_mutation would be used as a hack rather than for its primary use case

https://github.com/streamlit/streamlit/blob/0.53.0/lib/streamlit/caching.py#L373

luca · January 20, 2020, 9:20am

Hi,

Thank you very much for your kind reply!

kurt-rhee · November 18, 2020, 10:28pm

Hey guys,

I think I have a similar problem, I am trying to use multiprocessing with streamlit. I’ve had no trouble in the past using multiprocessing and streamlit together, but I can’t get it to work when I have a hashed database connection in my app.

Running my app serially runs fine. Running multiprocessing without the database connection works fine.

Also there are no database connection inside of multiprocessing, all of it is outside of the pool.

Database Connection:

@st.cache(hash_funcs={Connection: id})
def get_connection():
        """
        Put the connection in cache to reuse if path does not change between Streamlit reruns.
        NB : https://stackoverflow.com/questions/48218065/programmingerror-sqlite-objects-created-in-a-thread-can-only-be-used-in-that-sa
        """
        return sqlite3.connect("./database/solar_projects.db", check_same_thread=False)

Multiprocessing:

    st.write('multiprocessing')
        p = mp.Pool(processes=(2),
                    maxtasksperchild=1)
        results = p.map(run_autolayout, scenarios)
        p.close()
        p.join()

Topic		Replies	Views
Can a simple CRUD app be made with streamlit? Using Streamlit pandas , radio-button	17	25283	August 15, 2022
How to implement Cache when connecting Streamlit with a Postgresql DB? Using Streamlit cache	2	1421	August 15, 2022
Streamlit App crashing after few minutes of usage Community Cloud database	14	3485	November 18, 2024
Caching data across interactions Using Streamlit cache	2	1096	January 12, 2022
Streamlit app disconnects and stops before displaying data Using Streamlit	4	2874	January 12, 2022

Streamlit crashes when using Leveldb

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies