FacetGrid with Slider

ArvindR · July 27, 2020, 2:56pm

Hi all,

I am new to streamlit and want to use it to visualize a series of distributions over time. I’ve encoded the “time” variable in a slider, and the visualization is a FaceGrid from seaborn. However, when I implemented this, the UI was very slow. I used st.cache on all the data loading and processing functions, and the only thing that is happening is the generation of the FacetGrid itself upon a change in the slider value. I’ve attached a gif showing the issue.

The dataset in question is around 36k records. I was curious if I am doing something fundamentally wrong, as I had thought this kind of use case is exactly what streamlit was for. I would greatly appreciate anyone’s help on this!

randyzwitch · July 27, 2020, 3:35pm

Hi @ArvindR -

From your gif, it does look like the data is being loaded multiple times, which would certainly slow things down. Can you post the code you are running?

Best,
Randy

ArvindR · July 27, 2020, 3:46pm

@randyzwitch here is the code, I swapped out the function code to make it more readable. I used @st.cache on each function though so I’m not sure why it would reload data everytime.

@st.cache
def load_data():
    # return data loaded from CSV files in a dict

# Combine the datasets and standardize the column names
@st.cache
def combine_yearly_data(loaded_data):
    # Combine data from CSV files into a single dataframe 

@st.cache
def filter_out_bad_responses(data):
    # Filter out bad data

@st.cache
def filter_to_top_metros(data, num_metros=25):
    # Further filter data

# Prepare data for visualization
data = load_data()
combined_data = combine_yearly_data(data)
filtered_data = filter_out_bad_responses(combined_data)
top_metros_prepared_data = filter_to_top_metros(filtered_data)

year = st.slider("Year", 2007, 2017, step=2)
vac_stats = top_metros_prepared_data[top_metros_prepared_data['YEAR'] == str(year)]\
                .groupby('METRO')['VACMONTHS']\
                .agg(["count","median"])

bins = np.arange(0,26,1)
g = sns.FacetGrid(top_metros_prepared_data[top_metros_prepared_data['YEAR'] == str(year)], col='METRO', col_wrap=5, col_order = vac_stats.index.to_list())
g = g.map(sns.distplot, 'VACMONTHS', bins=bins).set(xlim=(0,25)).set_titles("{col_name}").set_axis_labels("Months vacant")
for axis in g.axes:
    sample_size = vac_stats['count'][axis.title.get_text()]
    median = vac_stats['median'][axis.title.get_text()]
    axis.set_title(f"{axis.title.get_text()}|Med:{int(median)}|Sample:{sample_size}")

g.fig.suptitle(f"{year} Vacancy Distributions", size=16)
g.fig.subplots_adjust(top=0.94)
st.pyplot()

randyzwitch · July 28, 2020, 12:44pm

Nothing immediately sticks out here, so this is probably one of those things where if I was going to solve this, I’d start benchmarking each step. For example, I’m not familiar with how fast a FacetGrid renders, so I don’t know if that’s an expensive function or not (a 25-way set of plots seems involved).

You could also consider pre-computation, to further speed things up, since your group-by appears to be constant as groupby('METRO')['VACMONTHS'] and your metrics are also with ["count","median"]

ArvindR · July 28, 2020, 2:24pm

@randyzwitch thanks for the tips - I’ll try those out. The last thing though - I was curious why the UI said load_data and combine_yearly_data were running each time, even though they were cached? I assumed it was a UI bug since print statements from those functions were not being executed/

randyzwitch · July 28, 2020, 2:59pm

Well yes, that’s the other question. I don’t think this is a bug per se, but figuring out why it’s saying that was my first suggestion. Without having the full code and dataset to explore, it’s hard for me to suggest what the issue might be.

Topic		Replies	Views
Cached data reloads slower than desired Using Streamlit	3	592	March 5, 2024
Using st.cache Using Streamlit cache	2	539	August 4, 2023
Streamlit being bit Slow on procedural programming Using Streamlit	5	1100	January 12, 2022
Caching Seems to not work. Function is re-run every time a slider is changed Using Streamlit cache , pandas	3	1041	May 13, 2022
Slow Data Display Using Streamlit cache	3	4062	November 19, 2021

FacetGrid with Slider

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies