Hello!
My goal is I am trying to create an upload pdf option in streamlit v1.3.0 as a part of my NLP project written in python 3.9.5 (Jupyter-lab kernel).
My Python code is like below to read the pdf file:
import PyPDF2 as pdf
file = open(r"C:\Users\<path>\Documents\ebook.pdf", 'rb')
pdf_reader = pdf.PdfFileReader(file)
text=''
for i in range(0,pdf_reader.numPages):
    pageObj = pdf_reader.getPage(i)
    text=text+pageObj.extractText()
print(text)
With my current streamlit code, I’m able to upload the pdf file:
uploaded_file = st.file_uploader("Choose a file", type="pdf")
if uploaded_file is not None:
    pdf_reader = pdf.PdfFileReader(uploaded_file)
    text=''
    for i in range(0,pdf_reader.numPages):
        pageObj = pdf_reader.getPage(i)
        text=text+pageObj.extractText()
    st.write(text)
But the issue is, I’m not sure how to enable the raw string literal and converting the reading as binary format during this pdf upload/reading i.e. the ‘r’ and ‘rb’ usage in python’s open method,
file = open(r"C:\Users<path>\Documents\ebook.pdf", ‘rb’)
Any idea on how to achieve this in streamlit pdf reading?
I tried searching this forum for the same and found this post helpful. But it does not mention much about string literals.
Based on what I see, the PdfFileReader Class (link) and the st.file_uploader widget (link) has no parameters (if I’m not wrong) to convert to ‘r’ and ‘rb’.
In this case, I’m unsure how to continue. It would be quite beneficial to understand more about this subject. Any assistance or pointers are greatly appreciated!
Thank you,