PDF Reader problems

Are there different types of PDF file?

I’ve created a simple Streamlit PDF reader app, using the file_uploader. This works some of the time & displays the PDF. I’m using pdf display code I found in the forum to do this. Slightly modified. The code I’m using is:

import streamlit as st
import base64

image = st.sidebar.file_uploader("Please browse for a pdf file")
st.sidebar.write("Only one file at a time!")


if image is not None:

    fn = image.name  

    
    if fn[-4:] ==".pdf" or fn[-4:] == ".PDF"  :
        
        with open(fn,"rb") as f:
            base64_pdf = base64.b64encode(f.read()).decode('utf-8')
        pdf_display = f'<embed src=”data:application/pdf;base64,{base64_pdf}” width=”800″ height=”1000″ type=”application/pdf”></embed>'
        # pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="800" height="1000" type="application/pdf"></iframe>'
        st.markdown(pdf_display, unsafe_allow_html=True)
    
    else:
        "Invalid File type (must be .pdf)  "

    #  *****************************  Function SHOW_PDF to read PDFs
        # def show_pdf(file_path):
        #     with open(file_path,"rb") as f:
        #         base64_pdf = base64.b64encode(f.read()).decode('utf-8')
        #     pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="800" height="800" type="application/pdf"></iframe>'
        #     st.markdown(pdf_display, unsafe_allow_html=True)
        
        #  *****************************

END OF CODE

With some files I get a File Not Found error & no display. This is the error message in Terminal:

  •                *
    
  •    2023-01-13 16:51:08.507 Uncaught app exception*
    
  •    Traceback (most recent call last):*
    
  •      File "/Users/timkendal/Desktop/Python Projects/Streamlit_testing/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script*
    
  •        exec(code, module.__dict__)*
    
  •      File "/Users/timkendal/Desktop/Python Projects/Streamlit_testing/pdf_reader.py", line 15, in <module>*
    
  •        with open(fn,"rb") as f:*
    
  •             ^^^^^^^^^^^^^*
    
  •    FileNotFoundError: [Errno 2] No such file or directory: 'Hidcote Map.pdf'*
    

It is always the same files that fail to display, and they are actually found, as the name appears under the file browser (in the Sidebar in my case). All the files I’m testing with are in the same folder. The code and the files are all local.

You will see that there are 2 lines in the code which are identical except that one uses ‘embed’ and the other ‘iframe’. I found a post that said one of these worked where the other did not. Neither work for me.

The salmon coloured error box in the Browser is this:

              FileNotFoundError: [Errno 2] No such file or directory: 'Worcs Beacon Path.pdf'
              
              Traceback:
              
              ```
              File "/Users/timkendal/Desktop/Python Projects/Streamlit_testing/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
                  exec(code, module.__dict__)File "/Users/timkendal/Desktop/Python Projects/Streamlit_testing/pdf_reader.py", line 15, in <module>
                  with open(fn,"rb") as f:
                       ^^^^^^^^^^^^^
              ```

In summary, why will the code above read and display only some PDF files?

Debug info

  • Streamlit version: 1.17.0 (updated to this today. Previous version had the same problem.)
  • Python version: 3.11.1
  • Using Conda? PipEnv? PyEnv? Pex? NO
  • OS version: MacOS Catalina 10.15.7
  • Browser version: Chrome Version 108.0.5359.124

Requirements file none

I hope someone can help here! It is frustrating!

Tim Kendal

1 Like

Calling file_uploader() won’t create a file in the server’s filesytem, instead it will return a file-like object from which you can directly read the bytes.

So do do not try to open a file and then read it, just use image.read() as the argument to b64encode.

1 Like

Thanks Goyo. I’m afraid your reply doesn’t help me much - I’m not an expert on these things, and I don’t know how to implement what you say.

If you could suggest exactly what my code should be, that would be great.

The point here is that the code I have works with some pdf files but not others. As I said in the question, why do some pdfs display as expected but others don’t?

All the pdfs I have tried display properly with Adobe Reader etc

I could upload 2 files (1 working & 1 not) if it would help but the uploader won’t let me send a pdf for some reason (file names except .jpg greyed out)

Thanks again for your reply

1 Like

Do not try to open the file and use this instead:

base64_pdf = base64.b64encode(image.read()).decode('utf-8')

1 Like

Thanks Goyo

Not working yet, but I’ll keep trying!

As a new. user of Streamlit, I think it’s disappointing that displaying a pdf file is so complicated!

1 Like

It is not that complicated and you got that part right anyway. You seem to be struggling with file / data management in python instead.

Take a look at the example I just deployed.

https://display-pdf.streamlit.app/

1 Like

Thanks Goyo

I’ve tried your code and of course it works - BUT only with some pdf files!! This was my original problem.

It’s always the same files that don’t display, and they all display with Adobe Reader, and that’s why I asked if all pdfs were the same

I attach 2 sample pdfs to this email (I can’t see how to upload files in the forum)

The one that works is Blenheim Palace Ma.pdf. The other one, Worcs Beacon Path.pdf does not display, though it is loaded and the name appears on-screen

Thanks again for your trouble

(Attachment Blenheim Palace Map.pdf is missing)

(Attachment Worcs Beacon Path.pdf is missing)

1 Like

Just had an auto response from Streamlit saying my 2 pdfs are not authorised so have been rejected. Presumably on security grounds.

Not sure how to get round this, or if you want to see them!

1 Like

Github, Google Drive, OneDrive…

1 Like

Google Drive:
https://drive.google.com/file/d/18gOWImu4O9VnrRwwraUav3IbS8ZdLHCw/view?usp=share_link

https://drive.google.com/file/d/18gIq3XAS8LFxfqvYiL3ny_KVQ7LBEryc/view?usp=share_link

I hope these work for you!

1 Like

Nope. I get acces denied. When sharing the file, grant General access with Viewer permissions to Anyone with the link.

share

1 Like

Thanks Goyo - sorry to mess you about

I’ve revisited Google & got new links (may be the same as the old ones), with sharing now correct (I Hope!)

https://drive.google.com/file/d/18gIq3XAS8LFxfqvYiL3ny_KVQ7LBEryc/view?usp=share_link

https://drive.google.com/file/d/18gOWImu4O9VnrRwwraUav3IbS8ZdLHCw/view?usp=share_link

I hope this works this time!

1 Like

Works for me using Gnome Web. Maybe it is a browser issue?

1 Like

Having a very similar issue here. The PDF will actually not display at all in Chrome, but works just fine in Safari and Firefox

1 Like

Thanks Jordan. I’ve only tried Chrome so far as a browser, but I’ll give Safari a go, and report back

1 Like

Just tried Safari & it doesn’t work either - shame. I’ve also tried Opera - same result.

It’s perhaps worth mentioning again that it is only some pdfs that fail to display, not all. Indeed most work ok.

1 Like

Thank you for your wonderful work, but unfortunately, it only works well on firefox, and failed in Chrome.

1 Like

Yes, it seems to fail with edge too. There are many reports in the forum about these browsers blocking the display of pdfs. I don’t know how to work around that.

1 Like

I have tried among labels like embed, iframe and object. All of these three work well in firefox browser, but failed in Chrome when file size is larger than 1MB. It gives me a headache.

1 Like

After a day of hard work, I suddenly found a way. Directly pass base64 strings as content of PDF is very redundant.

Using url like “src=http://localhost:8900/filename.pdf” is a much more relaxed way. Therefore, we can choose directly store the pdf file uploaded by user:

uploaded_file = st.sidebar.file_uploader(
label="Upload PDF files", type=["pdf"], accept_multiple_files=False
)

if not uploaded_file:
st.session_state.clear()
st.info("请在左侧上传您的PDF文件")
st.stop()

@st.cache_resource(ttl="1h")
def configure_PDF(file):
with open(os.path.join("/your_path/user_upload_pdfs", file.name), "wb") as f:
f.write(uploaded_file.getvalue())

The next step is to start up a HTTP server locally by:

python -m http.server 8900

at /your_path/user_upload_pdfs.

Then files can be accessed by ‘http://localhost:8900/filename.pdf

Finally, with any type of component like “embed”, “iframe” or “object” to view a PDF in Browser, writing HTML like:

embed src=“http://localhost:8900/{file.name}” type=“application/pdf” width=“1000px” height=“1100px”

In a production environment considering privacy, you can do more development on the HTTP server that temporarily stores PDF files.

2 Likes