How to enable raw string literal 'r' and binary format 'rb' during pdf upload/read?

billysilly · April 1, 2022, 12:07pm

Hello!

My goal is I am trying to create an upload pdf option in streamlit v1.3.0 as a part of my NLP project written in python 3.9.5 (Jupyter-lab kernel).

My Python code is like below to read the pdf file:

import PyPDF2 as pdf

file = open(r"C:\Users\<path>\Documents\ebook.pdf", 'rb')
pdf_reader = pdf.PdfFileReader(file)

text=''
for i in range(0,pdf_reader.numPages):
    pageObj = pdf_reader.getPage(i)
    text=text+pageObj.extractText()
print(text)

With my current streamlit code, I’m able to upload the pdf file:

uploaded_file = st.file_uploader("Choose a file", type="pdf")

if uploaded_file is not None:
    pdf_reader = pdf.PdfFileReader(uploaded_file)

    text=''
    for i in range(0,pdf_reader.numPages):
        pageObj = pdf_reader.getPage(i)
        text=text+pageObj.extractText()
    st.write(text)

But the issue is, I’m not sure how to enable the raw string literal and converting the reading as binary format during this pdf upload/reading i.e. the ‘r’ and ‘rb’ usage in python’s open method,

file = open(r"C:\Users<path>\Documents\ebook.pdf", ‘rb’)

Any idea on how to achieve this in streamlit pdf reading?

I tried searching this forum for the same and found this post helpful. But it does not mention much about string literals.

Based on what I see, the PdfFileReader Class (link) and the st.file_uploader widget (link) has no parameters (if I’m not wrong) to convert to ‘r’ and ‘rb’.

In this case, I’m unsure how to continue. It would be quite beneficial to understand more about this subject. Any assistance or pointers are greatly appreciated!

Thank you,

billysilly · April 1, 2022, 6:07pm

I found the solution to the issue of raw string literal (and not the binary format). We just have to use ‘/’ instead of r’\path' and my guess is it is internally taken care by file uploader widget for streamlit.

file = open( “C:/Users/path/ebook.pdf”, ‘rb’ )

The query on binary format read still exists.

randyzwitch · April 1, 2022, 8:57pm

Hi @billysilly -

In the examples section of st.file_uploader, you’ll see various ways of using the data provided after it is uploaded. If I were to guess, I suspect the real issue is that you need to call pdf_reader = pdf.PdfFileReader(uploaded_file.getvalue()), which will provide the data as raw bytes.

Best,
Randy

billysilly · April 1, 2022, 8:59pm

Thanks Randy @randyzwitch! I’ll go through the docs and work on your suggestion. Will update you soon

system · April 1, 2023, 8:59pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Streamlit App - Converting an Uploaded PDF to Seperate Images for Downloading Using Streamlit file-download	3	2622	February 1, 2024
How to upload a pdf file in streamlit Using Streamlit file-upload	14	25250	May 28, 2024
Expected str, bytes or os.PathLike object, not UploadedFile for PDF file Community Cloud	3	2932	May 13, 2022
Reading binary file using numpy in Streamlit Using Streamlit	10	1885	January 9, 2024
[st.file_uploader()] AttributeError: 'bytes' object has no attribute 'read' Using Streamlit streamlit-cloud , debugging	2	249	December 16, 2024

How to enable raw string literal 'r' and binary format 'rb' during pdf upload/read?

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies