Summary
Trying to take a user-uploaded PDF, edit it based on user inputs, then spit back out multiple different copies to be downloaded. Using PyMuPDF I was able to make it work when hosted entirely locally (no streamlit) but having trouble with file management now that it’s on streamlit.
I’m currently running this on localhost, if that changes things.
Steps to reproduce
Code snippet:
For user to upload PDF:
script = st.file_uploader("Upload Script (.pdf only)",type="pdf")
Edit and make download button:
if script is not None:
with fitz.open(stream=script.read(), filetype="pdf") as pdf_file:
pdf_page_count = pdf_file.page_count
for page in range(pdf_page_count):
page_obj = pdf_file[page]
content_of_page = pdf_file.get_page_text(page)
match_word = character_list[0]
content_of_page = page_obj.get_text("words",sort=False) #get rect for all words
for word in content_of_page:
if word[4] == match_word:
rect_comp = fitz.Rect(word[0],word[1],word[2],word[3])
highlight = page_obj.add_highlight_annot(rect_comp)
highlight.set_colors(stroke=[0, 1, 0.8])
highlight.update()
st.download_button(
label="Download Script",
data=pdf_file,
file_name="Highlighted Script",
mime="application/octet-stream"
)
Expected behavior:
When running a modified version of the above script through command prompt, it works and spits out a highlighted script (or scripts, by running the highlight function once for each character/actor pair) using pdf_file.save()
Actual behavior:
The above code gives the following error after a PDF is uploaded:
RuntimeError: Invalid binary data format: <class ‘fitz.fitz.Document’>
Traceback:
File "C:\Users\ME\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\scriptrunner\script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "C:\Users\ME\Desktop\Python Projects\highlighter_web.py", line 98, in <module>
st.download_button(
File "C:\Users\ME\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\runtime\metrics_util.py", line 332, in wrapped_func
result = non_optional_func(*args, **kwargs)
File "C:\Users\ME\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\elements\button.py", line 311, in download_button
return self._download_button(
File "C:\Users\ME\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\elements\button.py", line 355, in _download_button
marshall_file(
File "C:\Users\ME\AppData\Local\Programs\Python\Python310\lib\site-packages\streamlit\elements\button.py", line 487, in marshall_file
raise RuntimeError("Invalid binary data format: %s" % type(data))
Debug info
- Streamlit version: 1.22.0
- Python version: 3.10.4
- OS version: Win10
- Browser version: Brave 1.51.110 (up to date)
Additional information
I seem to be able to manipulate the uploaded PDF in some ways (like in this post), but things seem to fall apart when it comes time to download.
Thanks in advance for your help!