Seeking Solutions for Opening PDFs at a Specific Page in Streamlit Apps

Hi there,

I’m currently working on a Streamlit application that involves displaying PDF files as a reference within the app. A crucial requirement for this application is the ability to open these PDF files directly at a specific page, enhancing the user experience by directing them immediately to the relevant content.

Here’s the challenge: I’ve successfully embedded PDFs into the app using base64 encoding and an <iframe>, but I’ve hit a snag when it comes to opening the PDFs at a predetermined page. The current implementation looks something like this:

pythonCopy code

import streamlit as st
import base64

def displayPDF(file):
    with open(file, "rb") as f:
        base64_pdf = base64.b64encode(f.read()).decode("utf-8")
    pdf_display = f'<iframe src="data:application/pdf;base64,{base64_pdf}" width="700" height="1000" type="application/pdf"></iframe>'
    st.markdown(pdf_display, unsafe_allow_html=True)

This method successfully displays the PDF, but lacks the functionality to jump to a specific page.

My question to the community: Has anyone tackled a similar challenge or have insights on how to open a PDF at a specific page within a Streamlit app? I’m looking for solutions or workarounds that could be implemented to achieve this functionality, preferably without needing to extract individual pages as separate files.

Any suggestions, code snippets, or guidance on this matter would be greatly appreciated. I’m eager to learn from your experiences and find a solution that could benefit not only my project but others facing similar hurdles.

Thank you in advance for your time and help!

Best regards,

Luca

2 Likes

Search for pypdf2 and pymupdf. These libs are capable of splitting pdf files. I believe there are others.

2 Likes

@LucaVA,

I am also interested in that functionality. Most browsers will display the PDF on a specific page when it is provided. For example, if you go to https://arxiv.org/pdf/2401.00107.pdf#page=4, you will automatically land on page 4. So, in the simplest implementation, your url_name would be something like this: http://yourdomain.com/path_to_pdfs/{source["file_name"]}#page={source["page"]}. I am not sure if it possible to use it in streamlit though. I am wondering if it could be combined with HTML iframe as mentioned here: Rendering PDF on UI - #21 by Marinaobdulia.

3 Likes

I like pypdf for this. Here’s a quick and dirty solution

import pypdf

def slice_pdf(local_filepath, page_num_start, page_num_end):
    """Slices existing PDF and returns a PDF writer object which youll need to save then display"""
    # Do some validation for start/end page numbers rel to each other and PDF length
    pdf_reader = pypdf.PdfReader(local_filepath)
    pdf_writer = pypdf.PdfWriter()

    # Copying all pages to a new PDF
    for page in range(page_num_start, page_num_end):
        pdf_writer.add_page(pdf_reader.pages[page])
    return pdf_writer```

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.