Big telescope data

My data, looking for pulsars in the Andromeda Galaxy using the Green Bank radio telescope is 8TB in 10GB chunks. It is 8 bit unsigned integers. I work on multicore machines with 64GB or 128GB of ram.

I currently use numpy, and read in the 10GB chunks as binary, so there is no inflation. I use numpy operators as much as possible, but sometimes use numba and multitasking to utilize all the CPU cores.

Streamlit looks interesting. How big can a streamlit cache be? Can it be preserved in binary exactly?

I’ve been making graphics with matplotlib directly. I’m exploring seaborn and plotly.

How can I migrate to Streamlit incrementally?

My data, looking for pulsars in the Andromeda Galaxy using the Green Bank radio telescope

:astonished: So cool!!! :star_struck:

Streamlit looks interesting. How big can a streamlit cache be?

The cache has no formal limit, it just depends on what can fit in memory. This is usually not a problem for anyone, but in your case it may actually be too limiting :smiley:

We also have a “save to disk” mode for caching (with persist=True), but it actually saves to both disk and memory. This is useful when you want to reuse the cache even after killing / restarting your Streamlit server.

For more info, see https://streamlit.io/docs/api.html#streamlit.cache

If none of this works for your use case, let me know! We’ll create a feature request.

Can it be preserved in binary exactly?

The in-memory cache is just a key-value store where the values are the return values of your function exactly, and they are saved by reference. So they are stored in memory exactly the way you returned them from the cached function.

The on-disk cache uses pickle.

I’ve been making graphics with matplotlib directly. How can I migrate to Streamlit incrementally?

If you’re using normal Python scripts (rather than notebooks), then migrating to Streamlit is just a matter of replacing plt.show() with st.pyplot()! So should be quite simple.

And if you’re using notebooks you just need to add st.pyplot() right after your plotting command in order to show it. (Notebooks do this automatically for you at the end of the cell, but since Streamlit uses pure Python files we stick to the pure-Python way of doing things)

More info at https://streamlit.io/docs/api.html#streamlit.pyplot

I’m exploring seaborn and plotly.

Seaborn is just a skin on top of Matplotlib so you can use it with st.pyplot() as well.

As for Plotly, you can pass Plotly figures and graph objects into st.plotly_chart(the_obj) and they’ll show up in Streamlit.
https://streamlit.io/docs/api.html#streamlit.plotly_chart

3 Likes