Hello community,
I am having a dependency issue in my app: https://bex-rag-tutorial.streamlit.app/
The app is working fine most of the time but for one critical feature, I need to use Rapid OCR for reading text from images. This is where I need the rapidocr-onnxruntime
package. I have listed it in the requirements.txt file and the app logs show that it is installed during build. But when I upload an image and click “Process file”, I am having this error (the short version):
ImportError: `rapidocr-onnxruntime` package not found, please install it with
`pip install rapidocr-onnxruntime`
Here is the full traceback:
ImportError: libGL.so.1: cannot open shared object file: No such file or
directory
During handling of the above exception, another exception occurred:
────────────────────── Traceback (most recent call last) ───────────────────────
/home/adminuser/venv/lib/python3.12/site-packages/streamlit/runtime/scriptru
nner/exec_code.py:85 in exec_func_with_error_handling
/home/adminuser/venv/lib/python3.12/site-packages/streamlit/runtime/scriptru
nner/script_runner.py:576 in code_to_exec
/mount/src/rag_tutorial_hackernoon/app.py:37 in <module>
34 │ │ │ │
35 │ │ │ │ try:
36 │ │ │ │ │ # Process the document
❱ 37 │ │ │ │ │ chunks = process_document(uploaded_file.name)
38 │ │ │ │ │
39 │ │ │ │ │ # Create RAG chain
40 │ │ │ │ │ st.session_state.rag_chain = create_rag_chain(chunk
/mount/src/rag_tutorial_hackernoon/src/document_processor.py:16 in
process_document
13 │ if source.lower().endswith(".pdf"):
14 │ │ return process_pdf(source)
15 │ elif source.lower().endswith((".png", ".jpg", ".jpeg")):
❱ 16 │ │ return process_image(source)
17 │ else:
18 │ │ raise ValueError(f"Unsupported file type: {source}")
19
/mount/src/rag_tutorial_hackernoon/src/document_processor.py:44 in
process_image
41 │ # Extract text from image using OCR
42 │ with open(source, "rb") as image_file:
43 │ │ image_bytes = image_file.read()
❱ 44 │ extracted_text = extract_from_images_with_rapidocr([image_bytes])
45 │ documents = [Document(page_content=extracted_text, metadata={"sourc
46 │ return split_documents(documents)
47
/home/adminuser/venv/lib/python3.12/site-packages/langchain_community/docume
nt_loaders/parsers/pdf.py:70 in extract_from_images_with_rapidocr
67 │ try:
68 │ │ from rapidocr_onnxruntime import RapidOCR
69 │ except ImportError:
❱ 70 │ │ raise ImportError(
71 │ │ │ "`rapidocr-onnxruntime` package not found, please install
72 │ │ │ "`pip install rapidocr-onnxruntime`"
73 │ │ )
────────────────────────────────────────────────────────────────────────────────
ImportError: `rapidocr-onnxruntime` package not found, please install it with
`pip install rapidocr-onnxruntime`
Any ideas on how to solve it?
My Python version is 3.9.19 and Streamlit is 1.37.1.