Hi streamlit community
I’m building a streamlit app that allows the users to upload a full record genbank file and to explore its content (genes sequences, proteins sequences etc.) using biopython. Everything works perfectly except when I try to create a st.download_button() to download the hole genome sequence or a sequence (s) of a specific gene (s), I got the following Error:
RuntimeError: Invalid binary data format: <class 'int'>
File "C:\Users\PC\anaconda3\lib\site-packages\streamlit\script_runner.py", line 379, in _run_script
File "C:\Users\PC\Desktop\gb_reader\app.py", line 64, in <module>
File "C:\Users\PC\anaconda3\lib\site-packages\streamlit\elements\button.py", line 223, in download_button
File "C:\Users\PC\anaconda3\lib\site-packages\streamlit\elements\button.py", line 265, in _download_button
File "C:\Users\PC\anaconda3\lib\site-packages\streamlit\elements\button.py", line 373, in marshall_file
raise RuntimeError("Invalid binary data format: %s" % type(data))
Here’s a simple example of my code:
import streamlit as st
from Bio import SeqIO
st.title('Wellcome To Genebank Files Reader')
# create uploader
file_uploader=st.file_uploader(' Please upload a full record Genebank file ',type=[" gb "])
# Converting Genbank Format Into A Readable Streamlit Format
if file_uploader is not None:
# using the biopython function to write/download the file in a fasta format
download_file=SeqIO.write(seq_object , 'sequence.fasta' , 'fasta')
# create streamlit download_button
st.download_button(label='Download' , data=download_file, file_name=None, mime= 'application/octet-stream' )
I don’t know anything about biology, but I would expect that download_file=SeqIO.write(seq_object , 'sequence.fasta' , 'fasta') writes the file to the local filesystem, instead of providing the bytes to download_file as would be required by st.download_button
I would try something like:
with open('sequence.fasta', mode = 'rb') as f:
st.download_button(label='Download' , data=f, file_name='sequence.fasta', mime= 'application/octet-stream' )
Your solution is working but not as I expected, because download_file=SeqIO.write(seq_object , 'sequence.fasta' , 'fasta') is going to create a file named ‘sequence.fasta’ in my local filesystem then the st.download_button will download that file. Unfortunately, the biopython function: ‘SeqIO.write’ is the only solution to do that, because not only it allows to download the sequence file , but also it allows the conversion from genbank format to fasta format.
Is your solution going to work after deployment of my app ?
I built a section in my streamlit app that allows to filter all the genes presented in that genbank file and download the sequence of the filtered genes. So by using SeqIO.write(filtred_genes, ‘selected_genes.fasta’, ‘fasta’ ), a fasta file named ‘selected_genes.fasta’ that contains the sequences of the filtred genes is going to be generated in my data file of the App , which means that each time a user make a genes filter the file ‘selected_genes.fasta’ is going to be updated in my data file. Is this possible after deployment of the app. ?
I’m not sure I understand the distinction you are making here. This line writes the file SeqIO.write(seq_object , 'sequence.fasta' , 'fasta'), the line I provide then takes that file, opens it as bytes (which is what st.download wants to see as its data), then provides it to the download button. Are you saying that doesn’t work as a roundtrip, that sequence.fasta from the download button doesn’t match the file that’s written from SeqIO?
Actually, the sequence.fasta file from the download button and the file that’s written from SeqIO.write() are the same, but, the content of ‘sequence.fasta’ file is going to change each time a user uploads a different genbank file.
The idea that I want to know if it will work or not is :
I’m going to create an empty file named “sequence.fasta” then deploy it with the main application file. So each time a user uploads a new genbank file, the SeqIO.write() function writes the new data to that empty file (in GitHub Repo), then the 'open with () takes the ‘sequence.fasta’ containing the new data and provides it to st.download_button () as bytes .
Is this type of file exchange between Streamlit and GitHub possible or not ? To be honest, I’ve never tested that before?
Please check the figure below for better understanding of my idea.
Based on the Streamlit top-down execution model, you are going to write over the same file every time. So yes, it would work each time within a session.
However, your question about having GitHub as a step in the middle is a whole different thing. Yes, it is possible to write to GitHub, but it’s generally not a workflow that people consider, as GitHub is for version control instead of file storage.
Depending on how many users you are expecting, you could do something as simple as writing to Google Drive or Amazon S3, or possibly use a database where each file is written with some additional metadata such as the username, timestamp, etc. But how to keep track of individual users doing things, possibly concurrently, isn’t a solution I can provide in the context of a forum thread due to the considerably complexity involved.
Thank you so much for your feedback.
Your explanation was very clear .
First, instead of using the SeqIO.write() from biopython I will try to do it manually. If this solution didn’t work , then I will try to use Google Drive as file storage.