Extracting Text from a PDF Image Using Drawable Canvas

Hi all! :waving_hand:

I’m building a Streamlit app that allows users to upload a PDF and interactively extract specific fields (like name, email, etc.). I use PyMuPDF (fitz) for reading and locating text, and PIL to render a zoomed image of the page with rectangles highlighting fields.

Here’s what works so far:

  • I extract text fields using regex.

  • I find the bounding box of labels using page.search_for(“Field Label”).

  • I render the PDF page as an image and draw rectangles around the label positions.

Is there a way in Streamlit to allow users to click directly on the canvas or image to extract the text within the triangle?

  • Is this possible using st.canvas, streamlit-drawable-canvas, or another method?

If anyone has done something similar or has ideas, I’d really appreciate the input. Happy to share more code if needed!

Thanks in advance :folded_hands: