Expected str, bytes or os.PathLike object, not UploadedFile for PDF file

Saksham · March 3, 2021, 5:40pm

I am trying to extract text for a PDF by converting into image where my PDF is the input file.
In streamlit when I am trying to import the PDF and call the function

    uploaded_file = st.file_uploader('Import PDF from local', type='pdf')
    if uploaded_file is not None:
        text = pic_to_text(uploaded_file)
        st.success(text)

I am getting an error expected str, bytes or os.PathLike object, not UploadedFile
I am not wanting to extract the pages or text from PDF but directly pass PDF as an input to my function.

Marisa_Smith · March 4, 2021, 6:33pm

Hey @Saksham,

First, welcome to the Streamlit Community!!!

Can you post a link to your code? Also, you’re passing your uploaded file into a function that I cannot see, I imagine that somewhere in that function your passing the uploaded file directly into something that was expected “str”, “bytes” or a path.

Check out this discussion where someone has a similar issue (docs typo already fixed):

Happy Streamlit-ing!
Marisa

Saksham · March 5, 2021, 5:34am

Hi @Marisa_Smith ,

Thank you so much for your response. Please find the code below where I am trying to pass the pdf directly.

def pic_to_text(infile):
infile = read_pdf(infile) # Returns memory view
os.environ[“GOOGLE_APPLICATION_CREDENTIALS”] = “servicekey.json”

"""Detects text in an image file

ARGS
infile: path to image file

RETURNS
String of text detected in image
"""

# Instantiates a client
client = vision.ImageAnnotatorClient()

# Opens the input image file
content = infile.tobytes()

image = vision.Image(content=content)

# For dense text, use document_text_detection
# For less dense text, use text_detection
response = client.text_detection(image=image, image_context={"language_hints": ["en"]})
text = response.text_annotations[0].description
# print("Detected text: {}".format(text))
return text

def translate_text(text, source_language_code, target_language_code):

“”"Translates text to a given language using a glossary

ARGS
text: String of text to translate
source_language_code: language of input text
target_language_code: language of output text
project_id: GCP project id
glossary_name: name you gave your project's glossary
    resource when you created it

RETURNS
String of translated text
"""

# Instantiates a client
client = translate.TranslationServiceClient()

# Designates the data center location that you want to use
location = "us-central1"
project_id = "testprojectincloud"

parent = f"projects/{project_id}/locations/{location}"

result = client.translate_text(request={"parent": parent,
                                        "contents": [text],
                                        "mime_type": "text/plain",  # mime types: text/plain, text/html
                                        "source_language_code": source_language_code,
                                        "target_language_code": target_language_code
                                        }
                               )

# Extract translated text from API response
return result.translations

In the code above I am passing the pdf file,converting the pdf to image and getting the memory value, converting it into bytes and passing it to google cloud api to extract the text from image.

Topic		Replies	Views
Unable to use uploaded pdf file for pdftotext parsing on streamlit Using Streamlit debugging	3	84	October 10, 2024
How to upload a pdf file in streamlit Using Streamlit file-upload	14	28420	May 28, 2024
Convert uploaded pdf to PIL image Using Streamlit file-upload	2	3939	January 18, 2021
Convert pdf to text Using Streamlit heroku , nlp , computer-vision	2	1764	June 4, 2020
Can anyone explain how to use the files in our computer instead of uploading it everytime in bot Using Streamlit needs-more-info	2	455	February 14, 2024

Expected str, bytes or os.PathLike object, not UploadedFile for PDF file

Related topics