Ask-my-pdf -> Q&A for PDF files using OpenAI API

A technical demonstration that integrates Streamlit and GPT-3 to create a question answering system for PDF files, specifically designed for answering questions about board game rules, but should handle other use cases.

Requires OpenAI api key.

8 Likes

(post deleted by author)

I’ve updated my app with a small improvement to the HyDE (Hypothetical Document Embeddings) technique.

More details below:


2 Likes

The GitHub repository for “ask-my-pdf” has been made public and can be accessed at GitHub - mobarski/ask-my-pdf: Question answering system for PDF files.

2 Likes

I’ve tried, and the results are excellent. Much better than GitHub - mmz-001/knowledge_gpt: Accurate answers and instant citations for your documents.

I’d like to know what it’s needed to get data persistence. I think it’s really important, and I’d like to know what are the technical challenges to implementing it.

I’d like to have something like that to process all my sources of knowledge and have something like a Jarvis, a personal AI assistant, so I’m curious.

This is just great. I developed board games and tried uploading a sample pdf file, it worked.

1 Like

Also, I’m impressed with the good quality of the results by RALM with Hyde.

1 Like

Data persistence is the most requested feature :slight_smile: The main challenge here is just my time management, but still I should be able to add this within ~1 week.

2 Likes

Let me know if I can help with any testing.

1 Like

Can’t wait to play around with the app and code during the weekend :slight_smile:

1 Like

Work in progress preview:

1 Like

You can now save your indices :slight_smile: No additional credentials required - key derived from your API key is used to encrypt your content and separate it from other users.

1 Like

Checking it …

1 Like

I just created an issue on GitHub.

1 Like

Also, I’m curious about where you are putting your S3 credentials.

1 Like

I’m using environment variables (see storage.py)

Thank you. Are digitalocean spaces buckets cheaper, in your opinion?

But how do you set environment variables in the Streamlit Cloud environment?