Streamlit app for weather data analysis

andrea.botti · February 16, 2026, 4:02pm

Hi all,
I’m working on a Streamlit app that compares baseline weather files (TMYx) with future morphed climate scenarios (RCP 2050 / 2080) for building performance analysis.

https://eetra-future-weather-app.streamlit.app/

What the app does

Parses EPW weather files (Italy, ~3,500+ files total)
Builds:
- Hourly temperature datasets
- Daily statistics
- Monthly summaries
Compares:
- TMYx baseline
- TMYx variants (2004–2018, 2007–2021, etc.)
- Morphed RCP scenarios (rcp26, rcp45, rcp85 for 2050/2080)
Displays:
- Regional and national maps
- Percentile-based temperature deltas
- Location-level comparisons
- Interactive charts

All heavy preprocessing (EPW parsing, aggregation, pairing baseline ↔ RCP) is now done offline via Python scripts.
The Streamlit app should ideally only read precomputed parquet files.

Current data structure

Per location, I now generate a single parquet file containing:

Baseline TMYx
All RCP/year scenarios for that baseline

Folder structure:

data/
  04__italy_tmy_fwg_parquet/
      AB/
      BC/
      ...

Each parquet contains:

Hourly dry-bulb temperature
Daily stats
Monthly stats
Scenario metadata

Current performance challenges

Despite precomputing:

First load still feels heavy
Map rendering (many points) can lag
Switching between scenarios sometimes triggers noticeable recalculation
Cached functions sometimes invalidate more than expected

The app uses:

@st.cache_data
Parquet (pyarrow)
Pandas
Plotly
Folium for maps

Questions

Best practices for loading large parquet datasets in Streamlit?
- Should I pre-split more aggressively (e.g. per region only)?
- Is DuckDB a better backend than Pandas for this use case?
Map performance:
- Better approach than Folium for 150–200 markers?
- Should I pre-aggregate geojson layers?
Caching strategy:
- Is it better to cache whole DataFrames or pre-serialized lightweight objects?
- Should I move more logic into st.session_state instead of cache_data?
General architectural advice:
- Is there a better pattern for large scenario-based analytical apps?
- Would Snowflake / MotherDuck / DuckDB significantly improve performance?

AgentStreamy · February 16, 2026, 4:03pm

Welcome to the community and thanks for the detailed question! It seems likely that for large, scenario-based analytical apps like yours, the following best practices can help:

1. Loading Large Parquet Datasets:
Pre-splitting data by region or scenario can reduce memory usage and speed up load times, as you only read what’s needed. Using DuckDB to query Parquet files directly (without loading full DataFrames into memory) is often faster and more efficient than Pandas for large datasets. According to Streamlit Docs, you can cache query results with @st.cache_data and set a TTL to avoid stale data.

2. Map Performance:
Folium can lag with many markers. For 150–200 points, consider using Plotly’s scatter_mapbox or scatter_geo, which are more performant for interactive maps in Streamlit. Pre-aggregating or simplifying geojson layers can also help. See this FAQ and performance tips.

3. Caching Strategy:
Cache only what you need—prefer lightweight, pre-serialized objects or query results over entire DataFrames if possible. Use @st.cache_data for data and @st.cache_resource for connections or models (docs). Avoid over-caching, as it can increase memory usage and cause invalidations.

4. Session State vs. Cache:
Use st.session_state for user-specific, session-persistent variables (like UI state or selections), not for large data. Use @st.cache_data for shared, immutable data.

5. General Architecture:
DuckDB (or MotherDuck for cloud) is well-suited for querying Parquet files on demand and can outperform Pandas for large, analytical workloads (docs). Snowflake is powerful but may be overkill unless you need enterprise-scale features.

Would you like a step-by-step example of integrating DuckDB with Streamlit for this use case?

Sources:

Topic		Replies	Views
Streamlit App to create 3D-printable STL files of the earths surface 🌍 Show the Community! streamlit-cloud	4	1500	March 15, 2022
Separate streamlit app from automatic data loading Community Cloud cache , pandas , plotly , streamlit-cloud	2	689	June 20, 2023
First app - long-term temperature data extremes Show the Community!	4	134	July 7, 2026
Check out sanjana-code0's Streamlit Profile! Show the Community! cache , windows , session-state , file-upload , pandas , plotly , streamlit-cloud , llms , build-with-streamlit , discussion	0	34	December 18, 2025
Slow website architecture Using Streamlit	1	352	January 27, 2024

Streamlit app for weather data analysis

What the app does

Current data structure

Current performance challenges

Questions

Related topics