Hi everybody, its a sort of how to do question.
I tried to use pymupdf to read a pdf after uploading that vis st.file_upload(), but its giving me this error,
RuntimeError: cannot open <streamlit.uploaded_file_manager.UploadedFile object at 0x0000021208CC1CA8>: Invalid argument
Trace back:
File "d:\users\user\anaconda3\lib\site-packages\streamlit\script_runner.py", line 324, in _run_script
exec(code, module.__dict__)
File "D:\Documents\My_projects\Project Resume Analyzer\resume_st.py", line 230, in <module>
main()
File "D:\Documents\My_projects\Project Resume Analyzer\resume_st.py", line 197, in main
txt = read_pdf_with_fitz(docx_file)
File "D:\Documents\My_projects\Project Resume Analyzer\resume_st.py", line 94, in read_pdf_with_fitz
with fitz.open(file) as doc:
File "C:\Users\USER\AppData\Roaming\Python\Python37\site-packages\fitz\fitz.py", line 3523, in __init__
_fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
Code:
import fitz # this is pymupdf
def read_pdf_with_fitz(file):
with fitz.open(file) as doc:
text = ""
for page in doc:
text += page.getText()
return text
pdf = st.file_uploader("",type=['pdf'])
result = read_pdf_with_fitz(pdf)
PS: its not the exact code, but it’s pretty much it. and the error was coming from fitz.open() line.
Yes I know I can use pyPDF2 or pdfplumber to do that and even I am using pdfplumber for reading the file, but I am preferring Pymupdf because my project is related to NLP, so other packages reading the pdf in a very bad way and because of that I am not getting the desired o/p, but using pymupdf giving me better results. So, if anybody can help me by showing me how to read a pdf file using pymupdf
after uploading the file, then it would be very helpful🙏.
python version:3.7
streamlit version: 0.69.2