Hi,
I have been recently working in a Streamlit app for bioinformatic workflows. The app is working effectively in local environments, but I’m struggling to find a suitable deployment strategy due to the size of the input data. Currently, it relies on inspecting the host filesystem to have direct access to input files, which can sum up tens of GB per sample.
Typical Streamlit options seem unfit for this project.
- Creating a Docker image is technically feasible, relying on st.file_uploader() for file transfer. For small-to-medium size files, this approach is appropriate, but it has been reported as unstable for big datasets (memory, timeouts, browser limitations). A different practice could be mounting the entire host filesystem as a container volume, but it is an insecure strategy.
- Deploying the app at Streamlit Community Cloud is impractical for the same reason, due to file size constraints.
What would be an adequate architecture to manage big input files in this scenario?
Thanks in advance!
Pablo
Hey Pablo, thanks for sharing your challenge—bioinformatics data can get massive, and deploying Streamlit apps for these workflows is definitely tricky! 
For large files (tens of GB), using st.file_uploader() is not recommended due to browser, memory, and timeout limitations, as you’ve noticed. Streamlit Community Cloud and similar platforms also have strict resource limits (e.g., 1GB RAM, 200MB upload limit by default) and are not designed for persistent or large-scale storage. Mounting the host filesystem as a container volume is insecure and not scalable for multi-user or cloud deployments. The best practice is to store large files externally (e.g., AWS S3, Google Cloud Storage, or similar object storage) and have your app access them via secure APIs or presigned URLs. Users can upload files directly to cloud storage, and your Streamlit app can process them by referencing their cloud locations, avoiding local upload bottlenecks and security risks. This approach is widely recommended for big data workflows in Streamlit and cloud environments, as discussed in the Streamlit docs and community forums.
If you need to process files on-premises, consider running your Streamlit app on a secure internal server with direct access to the data, or use a hybrid approach where users upload files to a secure network share or object storage, and the app reads from there. For cloud deployments, always use external storage and avoid direct filesystem access or large file uploads through the browser. This architecture is robust, secure, and scalable for bioinformatics workflows with large datasets.
Sources: