PDF2Image and Poppler Problem

Python Version: 3.9.6
Streamlit Version: 1.32.1

Hello, I need help debugging a PDF2Image & Poppler problem. I store my code on GitHub and have done everything correctly (to my knowledge) so far, and my Streamlit website successfully displays my PDF files as images when I run them locally. However, when I tried deploying it, I got these errors from the ā€œManage Appā€ tab.

Traceback (most recent call last):
...
FileNotFoundError: [Errno 2] No such file or directory: 'poppler-24.02.0/Library/bin/pdfinfo'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
...
pdf2image.exceptions.PDFInfoNotInstalledError: Unable to get page count. Is poppler installed and in PATH?

I have looked at previous posts regarding this problem and did all the following:

  1. Putting pdf2image and the other libraries that I need under requirements.txt
  2. Putting packages.txt in the same directory as my script and writing poppler-utils inside.
  3. Specifying poppler_path = r'poppler-24.02.0/Library/bin' under convert_from_path.
  4. On top of 3, I specified the bin directory to my PATH system variable.

Even then, I am still getting the error above. Please help me resolve this problem and let me know if you need any additional information. Thank you very much for your help and time!

Hereā€™s the library that I imported and code that I used:

from pdf2image import convert_from_path
from pdf2image.exceptions import (
    PDFInfoNotInstalledError,
    PDFPageCountError,
    PDFSyntaxError
)
...
def pdf_to_images(pdf_path):
    # Convert PDF to a list of Pillow images
    pop_path = r'poppler-24.02.0/Library/bin'
    images = convert_from_path(pdf_path,500,poppler_path = pop_path)
    return images

From the docs:

If packages.txt exists in the root directory of your repository we automatically detect it, parse it, and install the listed packages.

It is not obvious to me that packages.txt in any other location will be taken into account. The logs should tell you whether poppler-utils was installed or not.

That path looks wrong to me. How did you come to that?

The os.environ["PATH"] system variable? The same directory as in item 3? I donā€™t think you need items 3 and 4. The poppler executables should be in the right place already.

1 Like

Yes, I specified doing so just to make sure I did not do anything wrong. Also, I just checked my logs and cropped out the text below:

Successfully installed MarkupSafe-2.1.5 altair-5.2.0 attrs-23.2.0 blinker-1.7.0 cachetools-5.3.3 certifi-2024.2.2 charset-normalizer-3.3.2 click-8.1.7 gitdb-4.0.11 gitpython-3.1.42 idna-3.6 jinja2-3.1.3 jsonschema-4.21.1 jsonschema-specifications-2023.12.1 markdown-it-py-3.0.0 mdurl-0.1.2 numpy-1.26.4 packaging-23.2 pandas-2.2.1 pdf2image-1.17.0 pillow-10.2.0 plotly-5.19.0 poppler-utils-0.1.0 protobuf-4.25.3 pyarrow-15.0.1 pydeck-0.8.1b0 pygments-2.17.2 python-dateutil-2.9.0.post0 pytz-2024.1 referencing-0.33.0 requests-2.31.0 rich-13.7.1 rpds-py-0.18.0 six-1.16.0 smmap-5.0.1 streamlit-1.32.1 streamlit_lottie-0.0.5 tenacity-8.2.3 toml-0.10.2 toolz-0.12.1 tornado-6.4 typing-extensions-4.10.0 tzdata-2024.1 urllib3-2.2.1 watchdog-4.0.0

So this should mean that poppler-utils are successfully installed and running in my app, right?

My bad, I typed in the wrong path. I have edited my post to reflect the correct path that I specified.

packages.txt

The docs suggests it should be in the root directory, you said you put it in the same directory as the script. These are not necessarily the same place.

poppler-utils

Putting it in requirements.txt should install the Pypi package. Putting it in packages.txt should install the Debian package (as long as the file is in the right place).

The text you pasted from the logs shows that the Pypi package was installed but it says nothing about the Debian package. In any case you need to figure out which one is what you need.

poppler_path

The path still looks wrong to me. And I still donā€™t know how you came to that.

PATH system variable

It is a bit unclear what you are doing here, but it doesnā€™t look right either.

If I had to guessā€¦

  • You need the Debian package (put poppler-utils in packages.txt)
  • You donā€™t need the Pypi package (donā€™t put poppler-utils in requirements.txt)
  • You donā€™t need to explicitly set convert_from_path or any system variables.
1 Like

Nevermind, I finally figured it out and got pdf2image and poppler to work when I deployed my app.

I did put the packages.txt (which contains poppler-utils) in the root directory of my repository along with my script, so everything is good there.

Previously, when I was working on this, I kept getting the exception:

Unable to get page count. Is poppler installed and in PATH?

The solution I found was to specify the bin folder under the poppler file I downloaded to my system PATH variable. Initially, it did not work, so to resolve this problem, I had to specify the poppler_path when I called the convert_from_path function like so:

convert_from_path('a_file_name.pdf', poppler_path='poppler-24.02.0\Library\bin')

My code ran without errors locally but faced issues like I mentioned when I tried to deploy my app. It turns out my app did not recognize when I specified poppler_path under convert_from_path. So, I had to get my system PATH variable working, and when it did, my app finally ran without errors.

Nonetheless, thank you very much for your help!

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.