Extracting Text from a PDF Image Using Drawable Canvas

Okello · May 15, 2025, 6:00pm

Hi all!

I’m building a Streamlit app that allows users to upload a PDF and interactively extract specific fields (like name, email, etc.). I use PyMuPDF (fitz) for reading and locating text, and PIL to render a zoomed image of the page with rectangles highlighting fields.

Here’s what works so far:

I extract text fields using regex.
I find the bounding box of labels using page.search_for(“Field Label”).
I render the PDF page as an image and draw rectangles around the label positions.

Is there a way in Streamlit to allow users to click directly on the canvas or image to extract the text within the triangle?

Is this possible using st.canvas, streamlit-drawable-canvas, or another method?

If anyone has done something similar or has ideas, I’d really appreciate the input. Happy to share more code if needed!

Thanks in advance

Topic		Replies	Views
Streamlit (Python) developer to build an interactive PDF text extraction application Jobs	0	87	May 20, 2025
Extracting data from PDF display Using Streamlit	2	1844	February 5, 2024
Text-extraction-app Show the Community! nlp , computer-vision	11	5220	June 7, 2021
Streamlit Highlight Text in PDF Using Streamlit	4	2967	October 1, 2024
PDF Annotation Extraction Part-2 Show the Community! file-upload , streamlit-cloud , python-programming	1	352	August 2, 2024

Extracting Text from a PDF Image Using Drawable Canvas

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies