I built a chatbot primarily from this (extremely helpful) Streamlit blog post . However, my custom data source is my PhD dissertation (a 227 page pdf). I changed the chat_mode to openai so it could pull from both the internet as well as my dissertation. It seems like the app has not ingested the ent…

Adding a long PDF as a custom data source

asehmi December 14, 2023, 10:49pm 8

I think LlamaIndex’s VectorStoreIndex will do the document chunking for you. Although you may wish to experiment with different types of chunking. Also take a look at this, which has an ingestion pipeline (which runs externally from the command line) that would be useful in your use case because it includes topic analysis. You may be able to combine ideas from it, my app and your own.

If you’re just experimenting, then you have more than enough to get going. If you want to dive deeper, then your next stop should be spacy-llm.

Topic		Replies	Views
DocumentGPT : A PDF Chat Agent with Web Search integration Show the Community! file-upload , streamlitbot , chatgpt , llms	2	3246	January 4, 2024
Advice needed: Converting Jupyter Notebook to Streamlit web app for LLM chatbot LLMs and AI discussion	2	350	July 30, 2024
🚀 Introducing my latest application, Docu Talk Show the Community! streamlit-cloud , discussion	6	336	March 20, 2025
DocDocGo: chatbot does "infinite" web research, creates KBs from websites or your files Show the Community! streamlit-cloud , chatgpt , llms , build-with-streamlit	24	1512	January 18, 2025
Build a chatbot with custom data sources, powered by LlamaIndex - issuee Using Streamlit discussion	2	129	November 25, 2024

Adding a long PDF as a custom data source

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies