Big telescope data

dsaroff · December 5, 2019, 6:13pm

My data, looking for pulsars in the Andromeda Galaxy using the Green Bank radio telescope is 8TB in 10GB chunks. It is 8 bit unsigned integers. I work on multicore machines with 64GB or 128GB of ram.

I currently use numpy, and read in the 10GB chunks as binary, so there is no inflation. I use numpy operators as much as possible, but sometimes use numba and multitasking to utilize all the CPU cores.

Streamlit looks interesting. How big can a streamlit cache be? Can it be preserved in binary exactly?

I’ve been making graphics with matplotlib directly. I’m exploring seaborn and plotly.

How can I migrate to Streamlit incrementally?

thiago · December 6, 2019, 12:26am

My data, looking for pulsars in the Andromeda Galaxy using the Green Bank radio telescope

So cool!!!

Streamlit looks interesting. How big can a streamlit cache be?

The cache has no formal limit, it just depends on what can fit in memory. This is usually not a problem for anyone, but in your case it may actually be too limiting

We also have a “save to disk” mode for caching (with persist=True), but it actually saves to both disk and memory. This is useful when you want to reuse the cache even after killing / restarting your Streamlit server.

For more info, see https://streamlit.io/docs/api.html#streamlit.cache

If none of this works for your use case, let me know! We’ll create a feature request.

Can it be preserved in binary exactly?

The in-memory cache is just a key-value store where the values are the return values of your function exactly, and they are saved by reference. So they are stored in memory exactly the way you returned them from the cached function.

The on-disk cache uses pickle.

I’ve been making graphics with matplotlib directly. How can I migrate to Streamlit incrementally?

If you’re using normal Python scripts (rather than notebooks), then migrating to Streamlit is just a matter of replacing plt.show() with st.pyplot()! So should be quite simple.

And if you’re using notebooks you just need to add st.pyplot() right after your plotting command in order to show it. (Notebooks do this automatically for you at the end of the cell, but since Streamlit uses pure Python files we stick to the pure-Python way of doing things)

More info at https://streamlit.io/docs/api.html#streamlit.pyplot

I’m exploring seaborn and plotly.

Seaborn is just a skin on top of Matplotlib so you can use it with st.pyplot() as well.

As for Plotly, you can pass Plotly figures and graph objects into st.plotly_chart(the_obj) and they’ll show up in Streamlit.
https://streamlit.io/docs/api.html#streamlit.plotly_chart

Topic		Replies	Views
Scalabilty of streamlit for pandas Using Streamlit cache , pandas	4	9129	November 19, 2021
How to properly optimize CPU and memory usage Using Streamlit cache , pandas	5	8458	May 13, 2022
Memory limits on using cache_data Using Streamlit cache	2	2327	October 11, 2023
Animation to show only the last n datapoints Using Streamlit cache , real-time	10	3156	January 12, 2022
Loading a large Graph from networkx Using Streamlit	3	1365	May 13, 2022

Big telescope data

Related topics