Extracting data from PDF display

Learner12 · February 1, 2024, 11:30pm

Hi everyone,

I’m trying to display a fillable PDF form on a webpage using Streamlit and extract the filled fields from the form. I’m currently using the following code to display the PDF file in an iframe using st.markdown:

import streamlit as st
import base64
import PyPDF2

pdf_data = open("template.pdf", "rb").read()

b64 = base64.b64encode(pdf_data).decode("utf-8")
pdf_display = f'<iframe src="data:application/pdf;base64,{b64}" width="700" height="1000" type="application/pdf"></iframe>'
st.markdown(pdf_display, unsafe_allow_html=True)

if st.button("extract"):
    # Get the form fields from the PDF file
    pdf_reader = PyPDF2.PdfReader(pdf_data)
    fields = pdf_reader.get_fields()
    
    # Convert the fields to a dictionary
    fields_dict = {}
    for field in fields:
        fields_dict[field] = fields[field].get("/V")
    
    # Save the fields as a JSON file
    with open("fields.json", "w") as f:
        json.dump(fields_dict, f)

However, I’m having trouble extracting the filled fields from the PDF form. I believe this is because the PDF file is being displayed in an iframe using st.markdown, and the filled fields are not being captured in the pdf_data variable.

I would appreciate any help on how to extract the filled fields from the PDF form displayed in the iframe using Streamlit or any other tools. If there are any other potential solutions besides Streamlit, I would be happy to explore those as well.

Thank you in advance for your help!

I hope this helps! Let me know if you have any further questions.

Running locally will be deployed.
Streamlit 1.31.0 and Python 3.11.7

sahirmaharaj · February 3, 2024, 4:20pm

Hello @Learner12,

Here’s a basic example of how you might start converting a PDF form into a Streamlit web form

import streamlit as st
import json

# Example form fields
name = st.text_input("Name")
age = st.number_input("Age", step=1)
gender = st.selectbox("Gender", ["Male", "Female", "Other"])
feedback = st.text_area("Feedback")

if st.button("Submit"):
    form_data = {
        "Name": name,
        "Age": age,
        "Gender": gender,
        "Feedback": feedback
    }
    
    # Process the data as needed
    st.write("Form Submitted Successfully!")
    st.json(form_data)
    
    # Optionally, save the data to a file
    with open("form_data.json", "w") as f:
        json.dump(form_data, f)

Hope this helps!

Kind Regards,
Sahir

P.S. Lets connect on LinkedIn!

Learner12 · February 5, 2024, 2:30pm

I am trying to use a fillable pdf form since there are a lot of pdfs and each pdf form has additional information that goes with the input fields so I can’t simply use st.text_input() for all the form fields.
I need someway to either extract the data from the iframe once the form has been filled out or someway to download the form when the user clicks a button and then extract the fields from the downloaded form with PyPDF2. The problem I’m having is when I try to download the iframe content it downloads the original empty form file instead of capturing the filled form.

This is what I tried to capture the data from the iframe:

const iframe = document.querySelector('iframe');
        
        // Use the `removeAttribute()` method to remove the `allow-scripts` attribute
        // iframe.removeAttribute('allow-scripts');

        // Use the `removeAttribute()` method to remove the `allow-same-origin` attribute
        // iframe.removeAttribute('allow-same-origin');
        
        const pdfUrl = iframe.src;

        // Use the `fetch` method to download the PDF file as a blob
        fetch(pdfUrl)
            .then(response => response.blob())
            .then(blob => {
            // Replace the filename with the name you want to give to the PDF file
            const filename = 'filled_form.pdf';

            // Use the `URL.createObjectURL` method to create a URL for the blob
            const url = URL.createObjectURL(blob);

            // Use the `download` attribute to download the PDF file
            const link = document.createElement('a');
            link.download = filename;
            link.href = url;
            link.click();

            // Use the `URL.revokeObjectURL` method to release the URL
            URL.revokeObjectURL(url);
        });

Thanks!

Topic		Replies	Views
Drag and Drop / pdf file / streamlit Using Streamlit	11	2195	June 22, 2024
Streamlit Highlight Text in PDF Using Streamlit	4	2893	October 1, 2024
Drag and Drop - How to use the file Using Streamlit	2	2458	October 11, 2023
Extracting Text from a PDF Image Using Drawable Canvas Using Streamlit discussion	0	22	May 15, 2025
PDF Reader problems Using Streamlit	21	4775	March 14, 2025

Extracting data from PDF display

Thank you in advance for your help!

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies