Ask-my-pdf -> Q&A for PDF files using OpenAI API

Klaudioz · February 22, 2023, 11:45pm

I just did that:

export S3_PREFIX=""
export S3_REGION="us-east-1"
export S3_BUCKET="my-bucket"
export S3_SECRET="XXXXXXXXXXXXXXXXXXX"
export S3_KEY="YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
export S3_URL="https://s3.amazonaws.com"

And I’m not seeing any error, but nothing is uploaded to my bucket. Using debug doesn’t show anything new. I’ve used multiple times this bucket so I know it’s OK.

kerbal · February 23, 2023, 6:47am

In Streamlit Cloud you can set environment variables during the deployment in the “Advanced settings”.

kerbal · February 23, 2023, 7:32am

I’ve tested it only with Digital Ocean. I cannot promise when I’ll be able to test it with AWS.

BUT when running ask-my-pdf localy you can use LocalStorage. Just change line 58 in gui.py:

Goyo · February 23, 2023, 7:42am

Oh, I forgot about that.

Klaudioz · February 23, 2023, 2:55pm

I was able to make it work with S3, but I don’t understand how to “load” the stored documents when I closed it and opened it again.
I recorded a video, I got the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "/Users/canalescl/personal/replit/ask-my-pdf/src/gui.py", line 251, in <module>
    b_ask()
  File "/Users/canalescl/personal/replit/ask-my-pdf/src/gui.py", line 168, in b_ask
    summary = ss['index']['summary']
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/session_state_proxy.py", line 89, in __getitem__
    return get_session_state()[key]
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/safe_session_state.py", line 110, in __getitem__
    return self._state[key]
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/session_state.py", line 438, in __getitem__
    raise KeyError(_missing_key_error_message(key))
KeyError: 'st.session_state has no key "index". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization'

kerbal · February 23, 2023, 8:11pm

Ok, so around 1:15 you only inspected the available files but you actually haven’t selected your file - just click on the name.

I will disable the “get answer” button when no file is loaded.

Klaudioz · February 24, 2023, 2:03pm

Hi, thanks for your help. I was able to make it work.

You are moving very fast, so I’m trying to catch you. I saw your updated README.md but can’t see any reference to REDIS_URL.

Also, don’t you need a password for it?

kerbal · February 24, 2023, 7:23pm

Redis url can contain all the important parameters about the db connection (host,port,password,tls/ssl,db-id). But Redis is not required to run the project. When STATS_MODE is not set to REDIS usage stats will be aggregated in a Dict object (not persistent). I’ll add the info about this to the readme.

Klaudioz · February 25, 2023, 10:50pm

Cool, it’s working !!

I have another question. For example, if I have a 1000 pages pdf. I think if I’m using an S3 bucket, when I open the app again, I need to re-load in memory the entire document, and I guess it’s downloading everything from S3 every time.
So, how big should the performance improvement be using something like embedding databases like Pinecone or Chroma?

kerbal · February 28, 2023, 10:46pm

For a single document - not that big really. You can expect a saving of <3s on startup (S3 fetch + deserialization) and <1s on each query (assuming 4 fragments per page). The reason why I’m using S3 here is that it’s fast enough and dirt-cheap (250GB → $5/month).

kerbal · March 2, 2023, 12:06am

Ask-My-PDF now uses gpt-3.5-turbo and allows the user to switch between models
I’ve also added the option to provide feedback about the answer.

kerbal · March 9, 2023, 7:50am

You can now try ask-my-pdf without an API key
Every day, a pool of 1,000,000 community tokens is available to everyone.

ferdy · March 9, 2023, 9:36am

What is feedback score and community token?

kerbal · March 9, 2023, 10:56am

Feedback score - number of feedback actions (thumb up / down) performed by the user (if API key is provided) or by the community (the default “user”).

Community tokens - measure of a budget that is provided by me (on my OpenAI API key) for people without OpenAI API key, so they can test/use the app. Currently it’s the equivalent of 1,000,000 gtp-3.5-turbo tokens per day.

osis · March 24, 2023, 3:59pm

I love this. I am wondering if it is possible to run this script and get responses from terminal without web browser? Could someone help me out?

dacostadaniel · June 28, 2023, 12:20pm

looking to develop an app; a startup situtaion; seeking your participation; would you be interested?

Daniel

shawngiese · August 30, 2023, 7:39pm

This is really nice. Thank you for sharing the code. What do you think of PDF with mixed data… for example text and tables? For example, if I enter in the CSV directly into my prompt I can query it, but tables / CSV in a PDF are not being detected. I think there needs to be a way to identify sparse and dense embeddings maybe.

ndrgnz · September 13, 2023, 12:14pm

Nice project!
A few questions:

is it possible to hide entirely the API key section and just use the secret one provided as a secret streamlit?
is it possible to hide the feedback part?
is it possible to hide the left panel?
how about implementi some OCR librarie for PDF?
Thanks!

Ahmed · October 24, 2023, 4:29pm

When running the app within streamlit, the community version tab does not show up, only the “Enter your OpenAI API key” does. Do you have to add the OPENAI_KEY set to your openai key within advanced settings? Is there something to edit with the COMMUNITY_USER?

system · October 23, 2024, 4:30pm

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to run streamlit on google colab? Using Streamlit streamlit-cloud , debugging	2	811	October 13, 2024
User interface similar to ChatGPT for question-answering, designed to handle hundreds of pre-configured PDFs, using Llama 2 available on local machine Using Streamlit cache , session-state , discussion	1	329	September 29, 2024
Make chatbot to read and answer from pdf files Community Cloud	23	6892	June 12, 2024
OpenAI Prompt Engineering Playground 🚀 Show the Community! streamlit-cloud , llms , build-with-streamlit	1	706	June 17, 2024
Codex for Streamlit? Random	9	1185	November 7, 2022

Ask-my-pdf -> Q&A for PDF files using OpenAI API

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies