Ask-my-pdf -> Q&A for PDF files using OpenAI API

I just did that:

export S3_PREFIX=""
export S3_REGION="us-east-1"
export S3_BUCKET="my-bucket"
export S3_SECRET="XXXXXXXXXXXXXXXXXXX"
export S3_KEY="YYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYYY"
export S3_URL="https://s3.amazonaws.com"

And I’m not seeing any error, but nothing is uploaded to my bucket. Using debug doesn’t show anything new. I’ve used multiple times this bucket so I know it’s OK.

In Streamlit Cloud you can set environment variables during the deployment in the “Advanced settings”.

1 Like

I’ve tested it only with Digital Ocean. I cannot promise when I’ll be able to test it with AWS.

BUT when running ask-my-pdf localy you can use LocalStorage. Just change line 58 in gui.py:

Oh, I forgot about that.

I was able to make it work with S3, but I don’t understand how to “load” the stored documents when I closed it and opened it again.
I recorded a video, I got the following error:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 565, in _run_script
    exec(code, module.__dict__)
  File "/Users/canalescl/personal/replit/ask-my-pdf/src/gui.py", line 251, in <module>
    b_ask()
  File "/Users/canalescl/personal/replit/ask-my-pdf/src/gui.py", line 168, in b_ask
    summary = ss['index']['summary']
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/session_state_proxy.py", line 89, in __getitem__
    return get_session_state()[key]
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/safe_session_state.py", line 110, in __getitem__
    return self._state[key]
  File "/usr/local/lib/python3.10/site-packages/streamlit/runtime/state/session_state.py", line 438, in __getitem__
    raise KeyError(_missing_key_error_message(key))
KeyError: 'st.session_state has no key "index". Did you forget to initialize it? More info: https://docs.streamlit.io/library/advanced-features/session-state#initialization'

Ok, so around 1:15 you only inspected the available files but you actually haven’t selected your file - just click on the name.

I will disable the “get answer” button when no file is loaded.

1 Like

Hi, thanks for your help. I was able to make it work.

You are moving very fast, so I’m trying to catch you. I saw your updated README.md but can’t see any reference to REDIS_URL.

image

Also, don’t you need a password for it?

1 Like

Redis url can contain all the important parameters about the db connection (host,port,password,tls/ssl,db-id). But Redis is not required to run the project. When STATS_MODE is not set to REDIS usage stats will be aggregated in a Dict object (not persistent). I’ll add the info about this to the readme.

Cool, it’s working !!

I have another question. For example, if I have a 1000 pages pdf. I think if I’m using an S3 bucket, when I open the app again, I need to re-load in memory the entire document, and I guess it’s downloading everything from S3 every time.
So, how big should the performance improvement be using something like embedding databases like Pinecone or Chroma?

1 Like

For a single document - not that big really. You can expect a saving of <3s on startup (S3 fetch + deserialization) and <1s on each query (assuming 4 fragments per page). The reason why I’m using S3 here is that it’s fast enough and dirt-cheap (250GB → $5/month).

Ask-My-PDF now uses gpt-3.5-turbo and allows the user to switch between models :slight_smile:
I’ve also added the option to provide feedback about the answer.

You can now try ask-my-pdf without an API key :slight_smile:
Every day, a pool of 1,000,000 community tokens is available to everyone.

2 Likes

What is feedback score and community token?

Feedback score - number of feedback actions (thumb up / down) performed by the user (if API key is provided) or by the community (the default “user”).

Community tokens - measure of a budget that is provided by me (on my OpenAI API key) for people without OpenAI API key, so they can test/use the app. Currently it’s the equivalent of 1,000,000 gtp-3.5-turbo tokens per day.

1 Like

I love this. I am wondering if it is possible to run this script and get responses from terminal without web browser? Could someone help me out?

looking to develop an app; a startup situtaion; seeking your participation; would you be interested?

Daniel

This is really nice. Thank you for sharing the code. What do you think of PDF with mixed data… for example text and tables? For example, if I enter in the CSV directly into my prompt I can query it, but tables / CSV in a PDF are not being detected. I think there needs to be a way to identify sparse and dense embeddings maybe.

Nice project!
A few questions:

  • is it possible to hide entirely the API key section and just use the secret one provided as a secret streamlit?
  • is it possible to hide the feedback part?
  • is it possible to hide the left panel?
  • how about implementi some OCR librarie for PDF?
    Thanks!

When running the app within streamlit, the community version tab does not show up, only the “Enter your OpenAI API key” does. Do you have to add the OPENAI_KEY set to your openai key within advanced settings? Is there something to edit with the COMMUNITY_USER?

This topic was automatically closed 365 days after the last reply. New replies are no longer allowed.