Streamlit Highlight Text in PDF

Hi,

I have a Streamlit App where a PDF is rendered. The function to display the PDF on a specific page looks like this:

def displayPDF(file, page):
    # Opening file from file path
    with open(file, "rb") as f:
        base64_pdf = base64.b64encode(f.read()).decode('utf-8')

    # Embedding PDF in HTML
    pdf_display = F'<iframe src="data:application/pdf;base64,{base64_pdf}#page={page}" width="100%" height="300" type="application/pdf"></iframe>'

    # Displaying File
    st.markdown(pdf_display, unsafe_allow_html=True)

Now I want to also highlight or mark some text in the rendered PDF. How can I do this?

I already tried changing the iframe Code to this:
pdf_display = F'<iframe src="data:application/pdf;base64,{base64_pdf}#page={page}&#search=%22Einleitung%22" width="100%" height="300" type="application/pdf"></iframe>'

So I added β€œ&#search=” but this did not work. I don’t want to only highlight one word, I would like to highlight a whole chunk on the specific doc.

Any ideas how to make it working?

1 Like

Here is an idea using pymupdf to render the page as an image and to highlight some text match with rectangles.

pdf_highlight

Code
import streamlit as st
import fitz

with st.sidebar:
    original_doc = st.file_uploader(
        "Upload PDF", accept_multiple_files=False, type="pdf"
    )
    text_lookup = st.text_input("Look for", max_chars=50)

if original_doc:
    with fitz.open(stream=original_doc.getvalue()) as doc:
        page_number = st.sidebar.number_input(
            "Page number", min_value=1, max_value=doc.page_count, value=1, step=1
        )
        page = doc.load_page(page_number - 1)

        if text_lookup:
            areas = page.search_for(text_lookup)

            for area in areas:
                page.add_rect_annot(area)

            pix = page.get_pixmap(dpi=120).tobytes()
            st.image(pix, use_column_width=True)

1 Like