When deployed I get the following two errors involving script_runner.py and import pyPDF2

I’m running into two problems. I working on a file uploader and during the build I get the following error “‘NoneType’ object has no attribute ‘seek’”, but my code works and I get the output I am looking for.
The second error I get when I deploy the app File “/home/appuser/venv/lib/python3.7/sitepackages/streamlit/script_runner.py”, line 354, in _run_script exec(code, module.dict)
and
File “/app/293pending_cases/pending_cases.py”, line 4, in
import docx2txt
however, I have removed the docx2txt import and still getting this error. When it does clear its followed by import pyPDF not found…I am not sure where I am going wrong. Any advice is greatly appreciated.

I can tell you that I have successfully deployed apps with both docx2txt and pyPDF, so it is not a fundamental problem with the libraries. Most likely a missing from X import Y statement. If you post the import code and the full error message we can probably help more.

This in the build …
AttributeError: ‘NoneType’ object has no attribute ‘seek’

Traceback:

File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/streamlit/script_runner.py", line 354, in _run_script
    exec(code, module.__dict__)File "/Users/hector/codeup-data-science/293rd_pending_cases/pending_cases.py", line 29, in <module>
    pdf_raw_text = read_pdf(pdf_file)File "/Users/hector/codeup-data-science/293rd_pending_cases/pending_cases.py", line 8, in read_pdf
    pdfReader = PdfFileReader(file) #reads pdfFile "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/PyPDF2/pdf.py", line 1084, in __init__
    self.read(stream)File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/PyPDF2/pdf.py", line 1689, in read
    stream.seek(-1, 2)

and this is when I deploy…
well I am now getting a “Mismatched workspace name and repository owner” when I go to deploy…this thing is frustrating

I hope this helps…
import streamlit as st
import pandas as pd
from PyPDF2 import PdfFileReader
import re
import gspread

def read_pdf(file):
pdfReader = PdfFileReader(file) #reads pdf
count = pdfReader.numPages #counts the number of pages
content = " "#space holder for pdf content
for i in range(count): #for loop to extract text from all pages
page = pdfReader.getPage(i) #gets page numbers
content += page.extractText() #extracts text from iterated pages

return content

Here is the PyPDF error I was talking about…

  File "/app/293rd_pending_cases/pending_cases.py", line 3, in <module>
    from PyPDF2 import PdfFileReader
ModuleNotFoundError: No module named 'PyPDF2'

You need to do import PyPDF2 before the from statement.

The mismatched workspace/repository issue has to do with your Github configuration, sounds like an issue with who owns the repository.

Unfortunately, adding the import ahead of the from statement didn’t work. I was in the wrong workspace when trying to deploy…

Hi @Hector_Rodriguez_Jr, welcome to the Streamlit community! :partying_face: :wave:

ModuleNotFoundError: No module named ‘PyPDF2’

The issue is that your repository does not contain a requirements file with your Python dependencies. As such, Streamlit Cloud has not installed packages like PyPDF2, docx2txt, gspread, etc, that your app uses.

Read our documentation on App dependencies and a knowledge base article on the ModuleNotFoundError.

You have the option of manually creating a requirements.txt file and including a Python package on each line. Take care to use the package name as it appears on PyPI. E.g. scikit-learn, not sklearn.

Alternatively, you can automate the creation of a requirements.txt using pipreqs. Run:

pipreqs /path/to/293rd_pendening_cases/

It will create a requirements.txt file for you that you can upload to GitHub.

When I run the above command upon cloning your repo, it creates a requirements file with the following entries:

gspread==5.1.1
streamlit==1.4.0
docx2txt==0.8
df2gspread==1.0.4
pandas==1.2.5
PyPDF2==1.26.0

Hope this helps!

Best, :balloon:
Snehan

Resources

1 Like

I had read something to this regard, but wasn’t about the requirement.txt I am going to try this. Thank you!

1 Like

Dude you are the MAN!!!
Thank you!

2 Likes

See https://stackoverflow.com/a/46659678/1024200. I have moved to pymupdf.

I may have to do that…thanks

I have another question. How do I get rid of this error?

File "/home/appuser/venv/lib/python3.7/site-packages/streamlit/script_runner.py", line 354, in _run_script
    exec(code, module.__dict__)File "/app/293rd_pending_cases/pending_cases.py", line 33, in <module>
 
    self.read(stream)File "/home/appuser/venv/lib/python3.7/site-packages/PyPDF2/pdf.py", line 1689, in read
    stream.seek(-1, 2)

I think it means PyPDF is trying to access a non-existent file object.

It was. I found at it…I have looked at this code over and over and it was sitting right in front of my face!

next problem: I am getting a FileNotFoundError: [Errno 2] No such file or directory: to my json file