Error when loading PDF file with LlamaIndex

Amit_Indap · September 12, 2023, 5:17pm

Summary

I am trying to load a PDF file with llama-index But I think it can’t create the directories it needs when it indexes the file when run on streamlit-cloud

Steps to reproduce

**Code **

@st.cache_resource(show_spinner=False)
def load_data():
    with st.spinner(text="Loading and indexing PDF! This should take 1-2 minutes."):
        PDFReader = download_loader("PDFReader")

        loader = PDFReader(custom_path="local_dir") # tried this custom_path solution, didn'twork
        data_file = Path(__file__).parent / "data" / "myfile.pdf"
        docs = loader.load_data(file=data_file)
        service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0.5, system_prompt="You are an expert on the LFJCC biographies and your job is to answer biographical questions. Assume that all questions are related to the LFJCC Board biographies. Keep your answers  based on facts – do not hallucinate features."))
        index = VectorStoreIndex.from_documents(docs, service_context=service_context)
        return index

index = load_data()

Expected behavior:
Load and index the PDF

Actual behavior:

Throws error:

File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
    exec(code, module.__dict__)
File "/mount/src/llama_jcc_app/pdf_app.py", line 57, in <module>
    index = load_data()
            ^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 211, in wrapper
    return cached_func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 242, in __call__
    return self._get_or_create_cached_value(args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 266, in _get_or_create_cached_value
    return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 320, in _handle_cache_miss
    computed_value = self._info.func(*func_args, **func_kwargs)
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mount/src/llama_jcc_app/pdf_app.py", line 48, in load_data
    PDFReader = download_loader("PDFReader")
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/llama_index/readers/download.py", line 117, in download_loader
    os.makedirs(dirpath)
File "<frozen os>", line 225, in makedirs

Debug info

streamlit version 1.26.0
python version 3.11.4

Requirements file


streamlit

openai

llama-index

nltk

Additional information

I tried this solution about adding the custom_path param, but it didn’t work. The app works fine on localhost.

Also, I can’t find the ‘Manage App’ link in the lower right to see the logs because it’s not there. But I’m pretty sure the function cannot make the directories on streamlit-cloud path

Caroline · September 12, 2023, 9:41pm

Hey @Amit_Indap,

Unfortunately, I ran into a similar issue trying to use LlamaHub connectors with Community Cloud – it generally won’t work because your app won’t be able to download the file to the working directory of Community Cloud. You’d need to either run the app locally or host it with another platform in order to successfully use LlamaHub connectors, sadly.

Amit_Indap · September 12, 2023, 9:55pm

Hi @Caroline thanks for the reply. That’s a bummer. I recently followed this blog post about using LlamaIndex to make a chatbot for streamlit docs, and it deployed fine. It’s using using Simple Directory Reader.

I’m still very new to LLMs and streamlit, but maybe I can try SimpleDirectoryReader instead?

Amit_Indap · September 13, 2023, 3:04am

Got it to work with SimpleDirectoryReader and including pypdf in requirements.txt!

Caroline · September 13, 2023, 12:57pm

Yes, you can definitely use SimpleDirectoryReader if you just store the data for the knowledge base in the app repo itself. Glad to hear it’s working!

system · September 15, 2023, 12:57pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

blackary · September 15, 2023, 1:24pm

I think you can actually get it work by passing `custom_path=‘.’ when calling download_loader(), because that will just write to the current directory, rather than trying to write to wherever it writes by default.

Caroline · September 15, 2023, 2:24pm

Oh that’s awesome, I’ll have to try that!

Topic		Replies	Views
Error running Streamlit-LLM LLMs and AI	3	759	September 19, 2023
PermissionError: [Errno 13] Permission denied /app of using llamaindex Using Streamlit streamlit-cloud	15	2143	July 29, 2024
How can I debug streamlit killed crashes? Using Streamlit	2	796	June 28, 2024
App died suddenly on cloud, working fine locally. Community Cloud streamlit-cloud , debugging	1	70	April 18, 2025
Can't query pdf using streamlit - IndexError: list index out of range Using Streamlit windows , session-state	1	741	April 27, 2024

Error when loading PDF file with LlamaIndex

Summary

Steps to reproduce

Debug info

Requirements file

Additional information

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies