STreamlit how to implement a stateful ML app that doesnt rerun after every widget interaction

bhargavbn · November 18, 2021, 1:46pm

I would like to implement a stateful ML app that doesnt rerun for every intreaction with a widget.

Flow of app: Step 1: Enter filepath and upload data onto dataframe on click of a button. Step 2: Show sample data Step 3: Show descriptive stats Step 4: Plot histogram for selected feature dynamically on selection from selectbox.

When I select a new variable apart from the default selection for step 4 plot, the entire script reruns. How do I save state information such that when I am at step 4, everything above this doesnt get rerun i.e step1,2,3 shouldnt be called again. Only what I am interacting with in step 4. Kindly help me out.

import numpy as np
import pandas as pd
import sklearn as sk
import matplotlib.pyplot as plt
import streamlit as st
import pyspark
from pyspark import *
from PIL import Image
from io import StringIO
import st_state_patch

def load_data(ss,uploaded_file):
    df = ss.read.format('csv').option('header','true').load(uploaded_file)
    return df

def sample_data(df,widget):
    df_sample = pd.DataFrame(df.head(5))
    df_sample.columns = df.columns
    widget.dataframe(df_sample)
    
def descriptive_stats(df,widget):
    df_desc = df.summary().toPandas()
    widget.dataframe(df_desc)
    
def hist_plot(df,col,widget):
    df_plot = df.select(col).toPandas().iloc[:,0]
    fig, ax = plt.subplots()
    ax.hist(df_plot,density = False, bins = 50)
    widget.pyplot(fig)

def main():
    sparkapp = pyspark.sql.SparkSession.builder.master('local[4]').appName('No-code Spark Pipeline').getOrCreate()
    df = pd.DataFrame()
    st.title("No-Code ML Spark Pipeline")
    st.subheader('1. Upload file (csv)')
    uploaded_file = st.text_input("Provide local file path")
    upload_button1 = st.button('Upload')
    st.caption('Sample data')
    upload_cont1 = st.empty()
    white_background = Image.open('C:/Users/hp/Desktop/white_600_240.png')
    upload_cont1.image(white_background)
    # Call load data
    if upload_button1:
        df = load_data(sparkapp,uploaded_file)
        sample_data(df, upload_cont1)
    # Call sample data
    
    st.subheader('2. Exploratory Data Analytics')
    st.caption('Descriptive statistics')
    eda_cont1 = st.empty()
    eda_cont1.image(white_background,use_column_width=True)
    # Call descriptive stats
    if upload_button1:
        descriptive_stats(df, eda_cont1)         
    st.caption('Histogram / Frequency plot')
    eda_sel_feat = st.selectbox('Select feature to be displayed', options = df.columns)
    eda_cont2 = st.empty()
    eda_cont2.image(white_background,use_column_width=True)
    # Call hist plot
    if upload_button1:
        hist_plot(df,eda_sel_feat,eda_cont2)


if __name__ == '__main__':
    main()

randyzwitch · November 18, 2021, 2:03pm

Hi @bhargavbn, welcome to the Streamlit community!

This re-running is fundamental to how Streamlit is designed. By re-running from top-down each time, you never get into an uncertain state of the app.

However, for cases like you describe, where the above steps can take considerable time, we have a handful of caching functions. The following documentation describes how to use them:

By using cache or memo or singleton, you effectively skip the re-running of steps by saving the results of those steps in memory. This makes it feel as if the steps aren’t re-run.

Best,
Randy

bhargavbn · November 18, 2021, 2:15pm

Hello Randy.

Thanks for your response.
I have tried using cache and state info with no success.
Could you pl share code snippet for 1 use-case for my problem.

I have 3 functions executed before the one called here.

eda_sel_feat = st.selectbox('Select feature to be displayed', options = df.columns)
eda_cont2 = st.empty()
eda_cont2.image(white_background,use_column_width=True)
# Call hist plot
if upload_button1:
    hist_plot(df,eda_sel_feat,eda_cont2)

Do I st.cache / st.experimental_memo the prev 3 functions i.e

@st.cache()
def load_data(ss,uploaded_file):
@st.cache()
def sample_data(df,widget):
@st.cache()
def descriptive_stats(df,widget):

Or do I do something around the place where the selectbox input changes.

eda_sel_feat = st.selectbox('Select feature to be displayed', options = df.columns)

I have tried caching with no success.
Could you pl share a snippet for just this one case. It would be really helpful.

Thanks.

bhargavbn · November 20, 2021, 12:29pm

Any inputs

asehmi · November 20, 2021, 2:29pm

I suggest you slightly re-structure your app to use st.form. Something like:

with st.form('Collect model info'):
    uploaded_file = st.text_input("Provide local file path")
    # ...etc...
    show_charts = st.checkbox('Show charts', True)
    st.form_submit_button('Apply')

if show_charts:
    # Show charts here with data collected in the form

HTH,
Arvindra

system · November 20, 2022, 2:29pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Everytime the selectbox option is selected the app is re-run Using Streamlit cache	3	2770	November 28, 2023
How to avoid app reruns for data guided proposal app with multiple user_inputs Using Streamlit cache	3	752	May 12, 2024
Not to run entire script when a widget value is changed? Using Streamlit cache	10	16099	August 15, 2022
Programmable State for Streamlit Official Announcements cache , session-state , matplotlib	25	11091	August 15, 2022
Avoiding refreshing the entire page when clicking a button Using Streamlit	13	34004	December 2, 2024

STreamlit how to implement a stateful ML app that doesnt rerun after every widget interaction

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies