Question: example of big application?

Does anyone have a project that handles large amounts of data that I can see working?
Or some example of a large company using streamlit.

Hello @Newton_Miotto, welcome to the community!

What exactly are you looking for though when saying Big application?
I could point you to GitHub - pymedphys/pymedphys: A community effort to develop an open standard library for Medical Physics in Python. Building quality transparent software together via peer review and open source distribution. Open code is better science. by @SimonBiggs which I think is one good example of Streamlit usage as part of a big project, but I’m not sure it will answer your doubts about using Streamlit in a Big application :confused:

Best,
Fanilo

1 Like

Hi @Newton_Miotto,

Welcome to the Streamlit community :slightly_smiling_face: :tada:.

Thanks @andfanilo for the ping :slightly_smiling_face:.

We have a Streamlit application that we host across three Radiation Oncology clinics in Australia. There are data pipelines running external to Streamlit which collect treatment delivery data and planned delivery data. You asked for an example with “large amounts of data”, so let me detail a rough idea of scale.

We currently have four Elekta Linear Accelerators (Linacs) which are the machines we use to deliver life saving radiation to our patients. These machines have MultiLeaf Collimators (MLCs) which are used to dynamically shape primary radiation as the machine rotates around the patient.

To get an idea here is a video of those MLCs:

We have four of these Linear Accelerators across 3 sites. For every patient delivery at each of these sites the machines save a treatment record file (TRF). These files contain a record of the 160 MLC positions at a time resolution of 25 Hz. A treatment delivery generally takes 1-2 minutes and we treat a new patient every 15 minutes. We then collect these TRF files and have been collecting them for at least the last two years.


We have then built a Streamlit application that searches that data and provides meaningful comparisons between the reported MLC positions on the Linac and the planned MLC positions saved within the planning software. We also provide options to compare this data to: our SQL clinical database, the DICOM file format, as well as a live data stream that is being recorded from the Linac.

To be able to build confidence that this application will continue working consistently through code changes and dependency upgrades we have implemented end to end testing with Cypress. A video of the Cypress tests that are undergone within continuous integration for every code commit can be seen at:

We pin the Streamlit version explicitly within our setup.py, and when updating the version we make sure the above tests pass. We pin the remaining of our dependencies with a poetry.lock file (https://github.com/pymedphys/pymedphys/blob/main/poetry.lock) using the brilliant poetry tool.

You can install the application yourself by running:

pip install pymedphys[user]==0.35.0
pymedphys gui

There is also a version of this application hosted online:

https://app.pymedphys.com/?app=metersetmap


If you’re interested in more, here is a presentation where I demo the application (it’s at about the 5 1/2 minute mark):

And here is another Streamlit post where I went into a more broad overview of the details:


I would like to add, I have confidence using Streamlit because I know that if there is something that Streamlit doesn’t do right now, I have the freedom to make it happen.

A bug bear of ours is that we couldn’t readily download large files within Streamlit as the main library currently stands. So… we fixed it. We have it fixed for our use case within PyMedPhys, and the process of getting that merged into the official Streamlit library is underway:

Also, my favorite part of Streamlit is the following:

…they can understand the inner workings… but also, make changes, and make new applications. It’s amazing. Streamlit gives me and my team superpowers. I hope your team can fly with Streamlit too :slightly_smiling_face:.

Cheers,
@SimonBiggs

4 Likes

@Newton_Miotto, what metric would you like a response to provide to indicate “amounts of data”?
MB per load/save?
MB/sec for the duration of an operation with life-cycle described in units of time?

@SimonBiggs mentioned an application that operated on data that was being produced at perhaps
(at least) 320 Bytes per system per 0.04 seconds (~16KByte/sec) x some number of machines for the facility, and needed to load information to compare that “data stream” against that is typically in the 100KB to 1000KB, but which only needs to be retrieved once every few minutes(15 minutes in the example given). Those data streams are not operating at 100% duty cycle (as @SimonBiggs described 1-2 minutes out of 15 minutes, so only about 6% to 15% duty cycle for a given machine, the machines are not operating in synchrony with each other but those fifteen minute time slices tend to line up with the clock for patient and clinical staff scheduling purposes, so the datastreams are likely bursty / having a lot of overlap rather than evenly distributed).

Another streamlit application (more for demonstrating a library function in action) in PyMedPhys uploads a related set of Data (a CT scan, typically containing several hundred “slices”, each slice typically ~512KB, and a handful of other data/objects/files including some of the kind mentioned by @SimonBiggs in his example) that will typically range from 100MB to 300 MB, perform the library function, and then download a similar quantity of data, but broken in to zipped chunks that fit within the 50MB limit (that was in streamlit at the time). The data is pseudonymised (the goal is to protect patient privacy while still allowing data sharing of complex sets of objects whose relationships to each other must be maintained).

The PyMedPhys examples operate on data of substantial size, but it’s not the same as drinking from a firehose in the style of dynamic analysis of global social media communication streams.

Thanks for your attention, guys. It was exactly what I was looking for. In my company, we will start a new stage of development and I needed to know if the streamlit could take it. For these answers, I believe it works.

Thanks again everyone!

4 Likes