Summary
I am trying to load a PDF file with llama-index But I think it can’t create the directories it needs when it indexes the file when run on streamlit-cloud
Steps to reproduce
**Code **
@st.cache_resource(show_spinner=False)
def load_data():
with st.spinner(text="Loading and indexing PDF! This should take 1-2 minutes."):
PDFReader = download_loader("PDFReader")
loader = PDFReader(custom_path="local_dir") # tried this custom_path solution, didn'twork
data_file = Path(__file__).parent / "data" / "myfile.pdf"
docs = loader.load_data(file=data_file)
service_context = ServiceContext.from_defaults(llm=OpenAI(model="gpt-3.5-turbo", temperature=0.5, system_prompt="You are an expert on the LFJCC biographies and your job is to answer biographical questions. Assume that all questions are related to the LFJCC Board biographies. Keep your answers based on facts – do not hallucinate features."))
index = VectorStoreIndex.from_documents(docs, service_context=service_context)
return index
index = load_data()
Expected behavior:
Load and index the PDF
Actual behavior:
Throws error:
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 552, in _run_script
exec(code, module.__dict__)
File "/mount/src/llama_jcc_app/pdf_app.py", line 57, in <module>
index = load_data()
^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 211, in wrapper
return cached_func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 242, in __call__
return self._get_or_create_cached_value(args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 266, in _get_or_create_cached_value
return self._handle_cache_miss(cache, value_key, func_args, func_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/streamlit/runtime/caching/cache_utils.py", line 320, in _handle_cache_miss
computed_value = self._info.func(*func_args, **func_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/mount/src/llama_jcc_app/pdf_app.py", line 48, in load_data
PDFReader = download_loader("PDFReader")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/adminuser/venv/lib/python3.11/site-packages/llama_index/readers/download.py", line 117, in download_loader
os.makedirs(dirpath)
File "<frozen os>", line 225, in makedirs
Debug info
streamlit version 1.26.0
python version 3.11.4
Requirements file
streamlit
openai
llama-index
nltk
Additional information
I tried this solution about adding the custom_path param, but it didn’t work. The app works fine on localhost.
Also, I can’t find the ‘Manage App’ link in the lower right to see the logs because it’s not there. But I’m pretty sure the function cannot make the directories on streamlit-cloud path