SSG-dataset - static site generators popularity on Github (Hugo, Gatsby, Jekyll, sphinx, etc)

I created a multipage version of a Streamlit app, that demonstrates several charts about static site generators (SSG) popularity (Hugo, Gatsby, Jekyll, sphinx, etc).

App: https://ssg-dataset.streamlit.app
Repo: GitHub - epogrebnyak/ssg-dataset: Open reproducible dataset on static site generators (SSG) popularity.

The dataset is collected from Github API and has repo names, stars, forks, issues, create and modify dates. There is a Python package to process this data, that results in a CSV file in project repo. The app reads the data and lays out several charts - by programming language, issues vs forks, years project is running and several projects without recent commits.

Multipage option is great, helps to focus on each part of the content, pages are shorter and tell one thing or topic.

Here are some things I learned while creating the app:

  1. I used ChatGPT to come up with an icon for the page, which really saved time.
    I asked about what icons could I use and where the icon looked poorly on actual site,
    asked again for more options. Much more satisfying than browsing emoji tables for a fun
    pictogram.

  2. I used two approches for badges:

    • For a badge with a number of SSG in a dataset, that I ended up creating locally with npm badge-maker. There was also some code with pybadges, but badge-maker allowed to use a one-liner (you can check badge command in justfile).
    • I relay poetry project version number to a Github tag, and make a release badge based on this tag though a regular Shields API.
  3. Someone mentioned it is nice to have Github colors for programming language, which I did (palette() function).

Open question - is st.session_state ok to save a dataset?

  • I used st.session_state to communicate the CSV data between pages, initialised at homepage. The state was not really changing, just some way to communicate my dataset between pages. The only discomfort is when a browser is pointing to a sub-page after restart, there is an error about no data available. Perhaps this not a likely scenario for a hosted page (the user always comes to home page, I assume).

  • My initial solution attempt was to use data.py and import it as a data store to everypage which did not work well – how to make it discoverable by home page and pages, make data actually persist and guarantee I’m not reading the CSV repeatedly – I could not resolve that quickly and st.session_state seemed a quicker solution.

Footer:

  • If I had access to page tree, I think a footer at each page made sense, what section to read next.
  • Also some global header and footer can be useful, but probably not a priority for streamlit as a library.
3 Likes

Hi @epogrebnyak, as someone who has spent much more time comparing SSGs than actually building things with them, I love this app!

To answer your open question – it is OK, but probably not the best choice in the case of your app. There are definitely cases where st.sesison_state makes sense for a dataset, but that’s primarily if you expect each of your users to have a different, custom-dataset (like if they are adding custom filters or something like that). Since session_state is per-session, if you have a static dataset that everyone uses, it will end up getting saved separate in each user’s session_state.

In this case, I would recommend instead doing exactly what you first tried – use st.cache_data and an imported function in each of your pages. You already have data.py with a get_meta and get_data function, so I would just import those in each page, and call the functions whenever you need the data, or use a global variable in each script.

For a specific example, Static_site_generators.py become:

import altair as alt
import pandas as pd
import requests
import streamlit as st

from data import get_data, get_meta

df = get_data()
meta = get_meta()

Since you’re ultimately using those variables anyway, and they’re not changing per-session, there’s no reason to store them in session_state – just use the same get_data and get_meta methods on every page, and keep the results in a global variable.

1 Like

To make a more complete example, here’s a PR Use data.py to import common functions by blackary · Pull Request #32 · epogrebnyak/ssg-dataset · GitHub – feel free to ignore any parts of that, but it shows how you can always access things that were previously being accessed by session_state by accessing them as imports from data.py instead.

1 Like

Thank you so much for the PR, now merged. It is a bit of magic data.py from root of app is visible to modules in \pages - probably against the conventional logic in Python.

@st.experimental_memo works, reading now about the benefit over @st.cache.

Never thought I’d be at a point to say “the contributed code used memoization” in a project, this is way too fancy, but true. ) Thanks again for your code and explaination!

1 Like

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.