Embed pdf files that are larger than 2MB

Don-Yin · September 18, 2021, 4:14pm

Hi all,

I am trying to embed a pdf window by using:

def displayPdf(self):
        base64Pdf = base64.b64encode(open(self.pathDisplayPdf, "rb").read()).decode("utf-8")
        pdfDisplay = StrParser.getEmbeddedPdf(base64Pdf=base64Pdf, height=1000)
        self.placeHolder.markdown(pdfDisplay, unsafe_allow_html=True)

where getEmbeddedPdf is defined as:

def getEmbeddedPdf(base64Pdf, height: int):
        return (
            f'<embed src="data:application/pdf;base64,{base64Pdf}" width=100% height="{height}" type="application/pdf">'
        )

This method succeeds on smaller files (<~1.5-2MB) but doesn’t load larger ones (>~2MB). The screen just turn black.

So my questions are:

Is there a more elegant way to embed pdf that is not limited by size?
what can be the problem with my current method?

humbly,
Don

fredzannarbor · September 18, 2021, 9:41pm

I do a lot of work with big PDFs. It might be easier to split your big PDFs into multiple pages and have the getEmbeddedPDF function rotate them. Splitting is easy using Acrobat or pdftk (which can do it on the fly as you prepare your view).

Don-Yin · September 19, 2021, 1:08pm

Thanks for your reply!
Following your suggestion, I tried this:

self.doc = fitz.open(self.pathToPdf)

and

def getPdfContent(self, page=1):
        self.doc.select([page])
        return self.doc.write()

and pass the page-specific bytes content to:

def displayPdf(self):
        base64Pdf = base64.b64encode(self.getPdfContent()).decode("utf-8")
        pdfDisplay = StrParser.getEmbeddedPdf(base64Pdf=base64Pdf, height=1000)
        st.markdown(pdfDisplay, unsafe_allow_html=True)

But again, this worked with smaller pdfs as expected, it shows only the specified page, but not with larger ones. Larger files still dont load.
I guess I could save the page (on its own) as a temp pdf file to local drive and read its content, but for various reasons I’d avoid doing so.
I think there is probably a timeout mechanism in loading pdfs, trying to figure out how to overcome it.

Again thanks,
Don

fredzannarbor · September 19, 2021, 3:34pm

Yes, I don’t think PDFs “stream”, you need to hold the whole structure in memory somehow. I fully understand that splitting the PDFs into many small page files is ugly, but it would probably work. You could also turn each page into an image.

ZKLO · May 4, 2023, 2:40pm

Did you find a solution to this problems?

I’m currently stuck with PDFs at 2MB limit as well

Topic		Replies	Views
Looking for a Solution to Embed PDF Files Larger Than 1.5MB Without Splitting Using Streamlit discussion , pdf-viewer	7	2341	September 30, 2024
How to display a pdf content fullsize Community Cloud pandas , streamlit-cloud	3	874	April 8, 2024
Rendering PDF on UI Using Streamlit	29	35444	June 25, 2024
Building PDFs in a streamlit app with fpdf2 Show the Community!	2	1975	May 21, 2023
Drag and Drop - How to use the file Using Streamlit	2	2493	October 11, 2023

Embed pdf files that are larger than 2MB

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies