How to upload a .pdf file in streamlit and then process it futher to extract the information .
Hello @Gyanaranjan_pathi, welcome to the Streamlit forums
- On the uploading part, you can use Streamlit’s file_uploader to display a file uploader on your app, as such :
import streamlit as st uploaded_file = st.file_uploader('Choose your .pdf file', type="pdf") if uploaded_file is not None: df = extract_data(uploaded_file)
- Then your PDF upload will be available as a StringIO object in the
uploaded_filevariable, so now to extract data from the PDF, you will need a Python library that can read your pdf as StringIO or a filelike object.
I used pdfplumber to extract tables from PDFs in one of my Streamlit apps,
pdfplumber.load accepts StringIO so you can do :
def extract_data(feed): data =  with pdfplumber.load(feed) as pdf: pages = pdf.pages for p in pages: data.append(p.extract_tables()) return None # build more code to return a dataframe
but there are multiple other librairies like camelot, tabula-py or pdfminersix and I had to test multiple ones for my use case before going with pdfplumber so you may need to test multiple ones too depending on the info you need to extract !
Hope this helps
Thank you @andfanilo