In the time I’ve worked at Streamlit, I’ve seen hundreds of impressive data apps ranging from computer vision applications to public health tracking of COVID-19 and even simple children’s games. I believe the growing popularity of Streamlit comes from the fast, iterative workflows through the Streamlit “magic” functionality and auto-reloading the front-end upon saving your Python script. Write some code, hit ‘Save’ in your editor, then visually inspect the correctness of each code change. And with the unveiling of Streamlit sharing for easy deployment of Streamlit apps, you can go from idea to coding to deploying your app in just minutes!
Once you've created a Streamlit app, you can use automated testing to future-proof it against regressions. In this post, I'll be showing how to programmatically validate that a Streamlit app is unchanged visually using the Python package SeleniumBase.
Case Study: streamlit-folium
To demonstrate how to create automated visual tests, I’m going to use the streamlit-folium GitHub repo, a Streamlit Component I created for the Folium Python library for leaflet.js. Visual regression tests help detect when the layout or content of an app changes, without requiring the developer to manually visually inspect the output each time a line of code changes in their Python library. Visual regression tests also help with cross-browser compatibility of your Streamlit apps and provide advanced warning about new browser versions affecting how your app is displayed.Baseline image of streamlit-folium test application
Setting Up A Test Harness
The streamlit-folium test harness has three files:
The first step is to create a Streamlit app using the package to be tested and use that to set the baseline. We can then use SeleniumBase to validate that the structure and visual appearance of the app remains unchanged relative to the baseline.
This post focuses on describing
test_package.py since it’s the file that covers how to use SeleniumBase and OpenCV for Streamlit testing.
Defining Test Success
There are several ways to think about what constitutes looking the same in terms of testing. I chose the following three principles for testing my streamlit-folium package:
- The Document Object Model (DOM) structure (but not necessarily the values) of the page should remain the same
- For values such as headings, test that those values are exactly equal
- Visually, the app should look the same
I decided to take these less strict definitions of “unchanged” for testing streamlit-folium, as the internals of the Folium package itself appear to be non-deterministic. Meaning, the same Python code will create the same looking image, but the generated HTML will be different.
Testing Using SeleniumBase
SeleniumBase is an all-in-one framework written in Python that wraps the Selenium WebDriver project for browser automation. SeleniumBase has two functions that we can use for the first and second testing principles listed above: check_window, which tests the DOM structure and assert_text, to ensure a specific piece of text is shown on the page.
To check the DOM structure, we first need a baseline, which we can generate using the
check_window function. The
check_window has two behaviors, based on the required
- If a folder <name> within the
visual_baseline/<Python file>.<test function name>path does not exist, this folder will be created with all of the baseline files
- If the folder does exist, then SeleniumBase will compare the current page against the baseline at the specified accuracy level
You can see an example of calling check_window and the resulting baseline files in the streamlit-folium repo. In order to keep the baseline constant between runs, I committed these files to the repo; if I were to make any substantive changes to the app I am testing (
app_to_test.py), I would need to remember to set the new baseline or the tests would fail.
With the baseline folder now present, running check_window runs the comparison test. I chose to run the test at Level 2, with the level definitions as follows:
- Level 1 (least strict): HTML tags are compared to tags_level1.txt
- Level 2: HTML tags and attribute names are compared to tags_level2.txt
- Level 3 (most strict): HTML tags, attribute names and attribute values are compared to tags_level3.txt
As mentioned in the “Defining Test Success” section, I run the
check_window function at Level 2, because the Folium library adds an GUID-like id value to the attribute values in the HTML, so the tests will never pass at Level 3 because the attribute values are always different between runs.
For the second test principle (“check certain values are equal”), the
assert_text method is very easy to run:
This function checks that the exact text “streamlit-folium” is present in the app, and the test passes because it’s the value of the H1 heading in this example.
Testing Using OpenCV
While checking the DOM structure and presence of a piece of text provides some useful information, my true acceptance criterion is that the visual appearance of the app doesn’t change from the baseline. In order to test that the app is visually the same down to the pixel, we can use the
save_screenshot method from SeleniumBase to capture the current visual state of the app and compare to the baseline using the OpenCV package:
Using OpenCV, the first step is to read in the baseline image and the current snapshot, then compare that the size of the pictures are identical (the
shape comparison checks that the NumPy ndarrays of pixels have the same dimensions). Assuming the pictures are both the same size, we can then use the
subtract function from OpenCV to calculate the per-element difference between pixels by channel (blue, green and red). If all three channels have no differences, then we know that the visual representation of the Streamlit app is identical between runs.
Automating Tests Using GitHub Actions
With our SeleniumBase and OpenCV code set up, we can now feel free to make changes to our Streamlit Component (or other Streamlit apps) and not worry about things breaking unintentionally. In my single-contributor project, it’s easy to enforce running the tests locally, but with tools such as GitHub Actions available for free for open-source projects, setting up a Continuous Integration pipeline guarantees the tests are run for each commit.
The streamlit-folium has a workflow
run_tests_each_PR.yml defined that does the following:
By having this workflow defined in your repo, and required status checks enabled on GitHub, every pull request will now have the following status check appended to the bottom, letting you know the status of your changes:
Writing Tests Saves Work In The Long Run
Having tests in your codebase has numerous benefits. As explained above, automating visual regression tests allows you to maintain an app without having to have a human in the loop looking for changes. Writing tests is also a great signal to potential users that you care about stability and long-term maintainability of your projects. It’s not only easy to write tests for a Streamlit app and have them automatically run on each GitHub commit, but that the extra work of adding tests to your Streamlit project will save you time in the long run.
Have questions about this post or Streamlit in general? Stop by the Streamlit Community forum, start a discussion, meet other Streamlit enthusiasts, find a collaborator in the Streamlit Component tracker or share your Streamlit project! There are plenty of ways to get involved in the Streamlit community and we look forward to welcoming you 🎈
This is a companion discussion topic for the original entry at https://blog.streamlit.io/testing-streamlit-apps-using-seleniumbase/