How to use pymupdf to read a pdf after uploading that via st.file_uploader()?

Hello @Soumyadip_Sarkar, I think you were missing the read() to read file as bytesIO which pymupdf can then consume.

For future reference, the following works:

import fitz
import streamlit as st

uploaded_pdf = st.file_uploader("Load pdf: ", type=['pdf'])

if uploaded_pdf is not None:
    with fitz.open(stream=uploaded_pdf.read(), filetype="pdf") as doc:
        text = ""
        for page in doc:
            text += page.getText()
        st.write(text) 

I’m not sure fitz.open() context manager always closes the file as I got some AttributeError: 'Document' object has no attribute 'isClosed' error so I closed the buffer manually too:

import fitz
import streamlit as st

uploaded_pdf = st.file_uploader("Load pdf: ", type=['pdf'])

if uploaded_pdf is not None:
    doc = fitz.open(stream=uploaded_pdf.read(), filetype="pdf")
    text = ""
    for page in doc:
        text += page.getText()
    st.write(text) 
    doc.close()
4 Likes