How to upload a pdf file in streamlit

Gyanaranjan_pathi · April 7, 2020, 9:49am

How to upload a .pdf file in streamlit and then process it futher to extract the information .

andfanilo · April 7, 2020, 11:28am

Hello @Gyanaranjan_pathi, welcome to the Streamlit forums

On the uploading part, you can use Streamlit’s file_uploader to display a file uploader on your app, as such :

import streamlit as st

uploaded_file = st.file_uploader('Choose your .pdf file', type="pdf")
if uploaded_file is not None:
    df = extract_data(uploaded_file)

Then your PDF upload will be available as a StringIO object in the uploaded_file variable, so now to extract data from the PDF, you will need a Python library that can read your pdf as StringIO or a filelike object.

I used pdfplumber to extract tables from PDFs in one of my Streamlit apps, pdfplumber.load accepts StringIO so you can do :

def extract_data(feed):
    data = []
    with pdfplumber.load(feed) as pdf:
        pages = pdf.pages
        for p in pages:
            data.append(p.extract_tables())
    return None # build more code to return a dataframe

but there are multiple other librairies like camelot, tabula-py or pdfminersix and I had to test multiple ones for my use case before going with pdfplumber so you may need to test multiple ones too depending on the info you need to extract !

Hope this helps

Gyanaranjan_pathi · April 7, 2020, 2:50pm

Thank you @andfanilo

santosh_boina · December 30, 2020, 5:42pm

@andfanilo, I came across this discussion while looking for PDF file upload and analysis. I am working on PDF files using ‘pdfminer.six’ . I could not find anything in documentation to load file(st.file_uploader object) like you mentioned for pdfplumber.

Any suggestions on handling pdf files using pdfminer.six library in streamlit app will be very helpful. Thanks:)

andfanilo · December 31, 2020, 4:27pm

Don’t have a lot of experience with pdfminer.six but at least the following seems to work with Streamlit’s file uploader:

import pdfminer
from pdfminer.high_level import extract_pages
import streamlit as st

st.write(pdfminer.__version__)  

uploaded_file = st.file_uploader("Choose a file", "pdf")
if uploaded_file is not None:
    for page_layout in extract_pages(uploaded_file):
        for element in page_layout:
            st.write(element)

Hope this can serve as a good starting point.

Fanilo

ANAND_VERMA · March 5, 2021, 2:25am

How can I extract images from a pdf of images

santosh_boina · May 27, 2021, 2:08pm

Hi @andfanilo , I am working on a use-case of extracting tabular data from pdf files. For this, I am using camelot as a table extraction library. How to parse the pdf uploaded through st.file_uploader() and pass it to camelot. As per my understanding from camelot documentation, camelot.read_pdf() only accepts file path as input.

andfanilo · May 27, 2021, 3:24pm

Hello @santosh_boina

If it absolutely requires a filepath and not a File-related object, you could try to write the uploaded file in a temporary folder and provide camelot with the URl to said file, then destroy the temporary file at the end of the job. You can copy the following bit of code:

Hope this gets you started!
Fanilo

okld · May 28, 2021, 1:49pm

Hello @santosh_boina, you might have better luck with this bit of code instead, which fixes a bug in the former:

Tarun_Aswini · February 4, 2022, 2:37pm

How to render a pdf file in streamlit?

def show_pdf(file_path):
with open(file_path,“rb”) as f:
base64_pdf = base64.b64encode(f.read()).decode(‘utf-8’)
pdf_display = F’’
st.markdown(pdf_display, unsafe_allow_html=True)
print(‘Done’)

show_pdf('C:/Users/Tarun/Downloads/SOFTWARE ENGINEERING NOTES.pdf')

I tried this, but neither pdf displays nor any errror msg. Pls help…

Vital-Fernandez · December 23, 2022, 8:14am

Did you find a solution?

MagicDash91 · March 17, 2023, 2:44pm

no, it doesn’t work

dylanxia2017 · April 18, 2023, 1:57pm

well, pdfplumber.load just doesn’t work

shailesh_sharma · January 2, 2024, 1:38pm

You can simply upload a pdf file and open it using pdfplumber

import pdfplumber
import streamlit as st

uploaded_file = st.file_uploader("Choose a file")
if uploaded_file is not None:
    st.success("Uploaded the file")
    with pdfplumber.open(uploaded_file) as file:
        all_pages = file.pages
        st.write(all_pages[0].extract_text()) # you can print and check the data from any page in pdf

imuroo · May 28, 2024, 2:15pm

Here is an example of how I used the pymupdf4llm library to convert the contents of a PDF file into markdown format. I hope this helps!

# pdf file upload
pdf_file = st.file_uploader('Upload a PDF file', type=['pdf'])

if pdf_file is not None:
    bytes_data = pdf_file.read()
    
    # create a temporary file
    with tempfile.NamedTemporaryFile(delete=False) as tmp_file:
        tmp_file.write(bytes_data)
        temp_file_path = tmp_file.name

        md_text = pymupdf4llm.to_markdown(temp_file_path)
        st.markdown(md_text)

Topic		Replies	Views
Unable to use uploaded pdf file for pdftotext parsing on streamlit Using Streamlit debugging	3	84	October 10, 2024
Unable to open pdf files in chrome when i tried to browse and upload it from local Using Streamlit windows , file-upload	6	1864	August 30, 2023
How to preview a uploaded file in streamlit Show the Community! file-upload , streamlit-cloud , discussion	0	190	September 24, 2025
How to use pymupdf to read a pdf after uploading that via st.file_uploader()? Using Streamlit file-upload	4	14811	March 5, 2021
Is it possible that use pikepdf to open a uploaded pdf file? Using Streamlit	3	1098	March 19, 2022

How to upload a pdf file in streamlit

Related topics