And I’m not seeing any error, but nothing is uploaded to my bucket. Using debug doesn’t show anything new. I’ve used multiple times this bucket so I know it’s OK.
I was able to make it work with S3, but I don’t understand how to “load” the stored documents when I closed it and opened it again.
I recorded a video, I got the following error:
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
exec(code, module.__dict__)
File "/Users/canalescl/personal/replit/ask-my-pdf/src/gui.py", line 251, in <module>
b_ask()
File "/Users/canalescl/personal/replit/ask-my-pdf/src/gui.py", line 168, in b_ask
summary = ss['index']['summary']
File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/session_state_proxy.py", line 89, in __getitem__
return get_session_state()[key]
File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/safe_session_state.py", line 110, in __getitem__
return self._state[key]
File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/session_state.py", line 438, in __getitem__
raise KeyError(_missing_key_error_message(key))
KeyError: 'st.session_state has no key "index". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization'
Redis url can contain all the important parameters about the db connection (host,port,password,tls/ssl,db-id). But Redis is not required to run the project. When STATS_MODE is not set to REDIS usage stats will be aggregated in a Dict object (not persistent). I’ll add the info about this to the readme.
I have another question. For example, if I have a 1000 pages pdf. I think if I’m using an S3 bucket, when I open the app again, I need to re-load in memory the entire document, and I guess it’s downloading everything from S3 every time.
So, how big should the performance improvement be using something like embedding databases like Pinecone or Chroma?
For a single document - not that big really. You can expect a saving of <3s on startup (S3 fetch + deserialization) and <1s on each query (assuming 4 fragments per page). The reason why I’m using S3 here is that it’s fast enough and dirt-cheap (250GB → $5/month).
Feedback score - number of feedback actions (thumb up / down) performed by the user (if API key is provided) or by the community (the default “user”).
Community tokens - measure of a budget that is provided by me (on my OpenAI API key) for people without OpenAI API key, so they can test/use the app. Currently it’s the equivalent of 1,000,000 gtp-3.5-turbo tokens per day.
This is really nice. Thank you for sharing the code. What do you think of PDF with mixed data… for example text and tables? For example, if I enter in the CSV directly into my prompt I can query it, but tables / CSV in a PDF are not being detected. I think there needs to be a way to identify sparse and dense embeddings maybe.
When running the app within streamlit, the community version tab does not show up, only the “Enter your OpenAI API key” does. Do you have to add the OPENAI_KEY set to your openai key within advanced settings? Is there something to edit with the COMMUNITY_USER?
Thanks for stopping by! We use cookies to help us understand how you interact with our website.
By clicking “Accept all”, you consent to our use of cookies. For more information, please see our privacy policy.
Cookie settings
Strictly necessary cookies
These cookies are necessary for the website to function and cannot be switched off. They are usually only set in response to actions made by you which amount to a request for services, such as setting your privacy preferences, logging in or filling in forms.
Performance cookies
These cookies allow us to count visits and traffic sources so we can measure and improve the performance of our site. They help us understand how visitors move around the site and which pages are most frequently visited.
Functional cookies
These cookies are used to record your choices and settings, maintain your preferences over time and recognize you when you return to our website. These cookies help us to personalize our content for you and remember your preferences.
Targeting cookies
These cookies may be deployed to our site by our advertising partners to build a profile of your interest and provide you with content that is relevant to you, including showing you relevant ads on other websites.