Interdependent Forms

@Jessica_Smith @andfanilo @thiago @okld @vdonato @kmcgrady @Charly_Wargnier

from pandas_profiling import ProfileReport
import streamlit as st
from streamlit_pandas_profiling import st_profile_report
import pandas as pd

def describe_col(df, col):
    print(df[col].unique())

def get_summary_stats(df):
    with st.form(key="describe-df"):
        col = st.selectbox("Select column for summary stats", df.columns)
        #on clicking the stats_btn the WHOLE APP RUNS again! FIX THIS PLEASE
        stats_btn = st.form_submit_button(label="Get Summary Statistics", on_click = describe_col, args=(df, col))

def file_handler():

   def profiler():
        file = st.session_state.upload
        delim = st.session_state.choice.split(" ")[1][1:-1]
        df = pd.read_csv(file, sep=delim, engine="python")
        file_info = {"Filename": file.name, "FileType": file.type, "FileSize": file.size}
        pr = ProfileReport(df, explorative=True)
        st.write(file_info)
        st_profile_report(pr)
        get_summary_stats(df)


   with st.form(key="file_upload"):
       data_file = st.file_uploader("Upload CSV File", type=['csv'], key="upload")
       seperators = [" ", "pipe (|)", r"tab (\t)", "comma (,)", "semicolon (;)"]
       choice = st.selectbox("Select File Seperator", seperators, key="choice")
       submit_file_btn = st.form_submit_button(label='Submit', on_click=profiler)


if __name__ =="__main__":
    file_handler()

On clicking the stats_btn the whole app runs again. I don’t want to do that

Furthermore, the profiler method does not work outside file handler method.
i.e. the form could be inside one function & could contain some inputs. But those inputs are locally restricted to the scope of the said function

I cannot access st.session_state.upload outside file_handler

Hi,

The reason this isn’t working as expected is that you’re chaining the second form’s rendering via the first form’s callback. Also you’re calling functions within the form’s lifecycle, in those callbacks. I think best practice is to use forms to collect field values (in a way that doesn’t cause Streamlit to rerun as you make adjustments to the field value widgets) and then after hitting the form submit button to run operations with those values. E.g. the profiler function should be called after the form is submitted and has bound its field values.

Since pandas profiler is quite heavyweight, I recommend holding its profile report state in session state whilst interacting with the data frame that is used to generate the report, otherwise you’ll lose the report state between Streamlit reruns and it would need to be regenerated each time, which takes a while.

Here’s a complete rewrite of your example demonstrating the way that I’d write it:

pandas_profiler.py

import streamlit as st
import pandas_profiling
from streamlit_pandas_profiling import st_profile_report
import pandas as pd

st.set_page_config(
    page_title='Data Profiler',
    layout='wide',
    page_icon='🔍'
)

state = st.session_state
if 'profile_report' not in state:
    state['profile_report'] = None

def generate_profile_report(data_file, delimiter, minimal):
    file_info = {"Filename": data_file.name, "FileType": data_file.type, "FileSize": data_file.size}
    st.write(file_info)

    df = pd.read_csv(data_file, sep=delimiter, engine="python")
    pr = df.profile_report(lazy=True, minimal=minimal)

    state.profile_report = {'data': df, 'pr': pr}

def file_handler():
    seperators = {" ": " ", "pipe (|)": "|", r"tab (\t)": "\t", "comma (,)":",", "semicolon (;)":";"}
    file_upload_form = st.form(key="file_upload")
    with file_upload_form:
        data_file = st.file_uploader("Upload CSV File", type=['csv'], key="upload")
        delimiter = seperators[st.selectbox("Select delimiter", seperators.keys(), key="delims")]
        minimal = st.checkbox('Minimal report', value=True)
        if file_upload_form.form_submit_button(label='Submit') and data_file:
            generate_profile_report(data_file, delimiter, minimal)

    if (data_file and delimiter and state.profile_report):
        data = st.session_state.profile_report['data']
        pr = st.session_state.profile_report['pr']
        col = st.selectbox("Select column for summary stats", options=['']+list(data.columns))
        if col != '':
            st.write(pd.DataFrame(data[col].unique(), columns=[col]).T)
        st_profile_report(pr)
    else:
        st.info('Please upload a CSV data file, select a delimiter and hit submit.')
        state.profile_report = None

if __name__ =="__main__":
    file_handler()

Demo
demo

See gist for the downloadable source file.

HTH,
Arvindra

1 Like

Leave aside “optimizing” pandas profiling rendering.

I want the forms to be interdependent as it is a legitimate a usecase!

User uploads a dataset using one form.
And a second form pops up after submitting the first one—which contains the column names of the dataset in a select box (i.e. choices parameter should be df.columns in the 2nd form)

If you follow the rules of forms as I stated and you pass values to different interdependent forms using session state it should work. Go ahead and give it a go.

1 Like

Hey Arvindra,
THANK YOU SO MUCH BRAH! I finally understand now, how “chain” forms. Here is my code:–

from pandas_profiling import ProfileReport
import streamlit as st
from streamlit_pandas_profiling import st_profile_report
import pandas as pd


def summary(col, df):
    st.write(col)
    st.write(df)

def profiler(file, delim):
    file = st.session_state.upload
    delimiter = st.session_state.delim.split(" ")[1][1:-1]
    df = pd.read_csv(file, sep=delimiter, engine="python")
    file_info = {"Filename": file.name, "FileType": file.type, "FileSize": file.size}
    st.write(file_info)

    with st.form(key="col-stats"):
        cols = [val for val in df.columns]
        dropdown_choice = st.selectbox("Select column for clustering", cols, key="col_name")
        submit_btn = st.form_submit_button(label = "get Col summary", on_click=summary, args=(st.session_state.col_name, df))


def data_uploader_form():
   with st.form(key="file_upload"):
       data_file = st.file_uploader("Upload CSV File", type=['csv'], key="upload")
       seperators = ["pipe (|)", r"tab (\t)", "comma (,)", "semicolon (;)"]
       choice = st.selectbox("Select File Seperator", seperators, key="delim")
       submit_file_btn = st.form_submit_button(label='Profile Data', on_click=profiler, args=(st.session_state.upload, st.session_state.delim))

if __name__ =="__main__":
    data_uploader_form()

I’ll obviously be adding page configurations such as theme, layout etc. But the main goal of the app is–
user uploads data–receives generic profile & then selects a column & distance measure (levenshtein, trigram, cosine etc)
the app takes that column & distance measure & runs agglomerative clustering–and outputs a dataframe/file–the output can be used to merge-duplicates–i.e. suppose in a column called City–you’ve the following values–“chicago”, “chicago, illinois” & “chicaaagoo”–> then one might want to merge these 3 values into one–say “chicago” or whatever.