Hi everybody, its a sort of how to do question.
I tried to use pymupdf to read a pdf after uploading that vis st.file_upload(), but its giving me this error,
RuntimeError: cannot open <streamlit.uploaded_file_manager.UploadedFile object at 0x0000021208CC1CA8>: Invalid argument
File "d:\users\user\anaconda3\lib\site-packages\streamlit\script_runner.py", line 324, in _run_script exec(code, module.__dict__) File "D:\Documents\My_projects\Project Resume Analyzer\resume_st.py", line 230, in <module> main() File "D:\Documents\My_projects\Project Resume Analyzer\resume_st.py", line 197, in main txt = read_pdf_with_fitz(docx_file) File "D:\Documents\My_projects\Project Resume Analyzer\resume_st.py", line 94, in read_pdf_with_fitz with fitz.open(file) as doc: File "C:\Users\USER\AppData\Roaming\Python\Python37\site-packages\fitz\fitz.py", line 3523, in __init__ _fitz.Document_swiginit(self, _fitz.new_Document(filename, stream, filetype, rect, width, height, fontsize))
import fitz # this is pymupdf def read_pdf_with_fitz(file): with fitz.open(file) as doc: text = "" for page in doc: text += page.getText() return text pdf = st.file_uploader("",type=['pdf']) result = read_pdf_with_fitz(pdf)
PS: its not the exact code, but it’s pretty much it. and the error was coming from fitz.open() line.
Yes I know I can use pyPDF2 or pdfplumber to do that and even I am using pdfplumber for reading the file, but I am preferring Pymupdf because my project is related to NLP, so other packages reading the pdf in a very bad way and because of that I am not getting the desired o/p, but using pymupdf giving me better results. So, if anybody can help me by showing me how to read a pdf file using
pymupdf after uploading the file, then it would be very helpful🙏.
streamlit version: 0.69.2