DocDocGo: chatbot does "infinite" web research, creates KBs from websites or your files

Hi all!

I made an app that can do in-depth Internet research for you, generating a report with every research iteration, then answer any follow-up questions using all of the fetched sites as a knowledge base (KB).

It also allows you to create a KB out of your local files and chat with them. If you have some dense documentation files that you want to ask questions about, just upload them and chat away!

DocDocGo screenshot 1

By default the bot operates using GPT 3.5, but you can enter your own OpenAI API key and then have the option to switch to GPT 4. Also, all users sharing the default key also share the KBs they create, but if you use your own key it automatically serves as your unique user ID and your subsequent KBs won’t be visible to other users.

Not sure how to get started? No problem, just ask DocDocGo itself! It comes prepackaged with a default KB of its own docs, so it knows how to use it.

This is work in progress, there’s definitely lots of room for improvement. Any feedback is always welcome. Links:

20 Likes

Very cool!

2 Likes

Thanks! I recently redesigned the algorithm for combining results from many sites into a report - now the way to do “infinite” research is:

/research auto [number of desired iterations]

Or, as before, ask the bot itself for help using it :grin:

Thanks for sharing!

1 Like

So much there. Great app. Thanks for sharing.

1 Like

Thanks, @asehmi @WendyOngg, and please stay tuned for more improvements of the “infinite” research functionality and other features!

2 Likes

Promised improvements delivered - a new version of DocDocGo is out, with a much improved RAG algorithm and a more intuitive “infinite” research command.

“Infinite” research
The previous “/research auto N” command still works, but it led to some confusion around the right N to use. So now we have the more straightforward “/research deeper” command, which simply doubles the number of websites the bot “reads” and incorporates into its final report.

So the simplest way to do web research is now as follows:
a. “/research legal arguments for and against disqualifying Trump”
b. “/research deeper” to double the number of sources OR “/research deeper N” to double them N times
c. Read the final report and chat with the resulting knowledge base, e.g. "Which legal scholars have argued for disqualification?"

Improved RAG
I have revamped the algorithm that selects the document excerpts to submit to the LLM. The new algorithm is a custom version of parent document retrieval with some twists I came up with, which results in more accurate answers and a reduced chance that important information is missed.

Give it a try and remember - you can always ask the bot itself for help navigating all its features and commands (as long as the default collection docdocgo-documentation is selected).

1 Like

[Edit: You explained this in the .env file example] I’m a little confused as to what the docker container is running. Can you explain please?

I have Python 3.9, so I can’t run the Streamlit app. It seems your Streamlit app is using Python 3.11+'s pipe operator union types (and for dict merges). Will have to convert to use typing.Union syntax.

1 Like

Looks like you already found the answer, but yes, the Docker container is running the Chroma database. However, I made it so this is optional: if you copy .env.example to .env and fill in your OpenAI API key (the only required value, though a couple others are recommended) you can run the app without Chroma running in a container, it will just be on your local drive.

I am using 3.11 throughout, even outside the Streamlit component (which is optional). But I think if you make the modifications you mentioned then it may work with 3.9. I have not tested that though.

2 Likes

Yes, it worked! It’s a very nice application that I’m enjoying using. I spent a whole evening with it researching dynamic time warping. Once I get my head around its design it’ll help me add conversational elements to apps that I’m working on. I particularly like your short commands to switch modes and the care you took to manage mal events (formatting, exceptions, etc.), which are so common in LLM (networked) apps. Many thanks for sharing.

2 Likes

@asehmi I wasn’t familiar with dynamic time warping, it sounded like very science-fictiony, but I had ChatGPT cure me of my ignorance. I’m glad you’re finding the research feature useful!

2 Likes

A new relatively simple but useful feature is out - you can now:

  • get a summary of a website/PDF (run “/summarize https://example.com”)
  • ask follow-up questions
  • add another website/PDF to the knowledge base, and another… (run /summarize again, or “/ingest https://example2.com” if you don’t want a summary)

When you are done, just rename the resulting collection to whatever you want or delete it:

  • /db rename my-cool-collection” OR
  • /db delete --current
2 Likes

Announcing a new awesome update: you can now ask DocDocGo to redo research reports, if the initial format or content wasn’t what you had in mind, while still keeping all of the content it has ingested into a KB.

This is really useful, because one annoyance with the “infinite” research has been that DDG finds and ingests lots of useful information, but writes a report that’s too long/short, or has a different format from what you had in mind, or emphasizes the wrong information.

Previously the only recourse was to start a new research from scratch, but now you can just quickly rewrite the report(s) using the already ingested content, which is ~10x faster:

  1. Run /research claims Putin made in interview with Tucker Carlson - your original research query (just an example)
  2. Run /research view stats - to review your query and auto-generated report type
  3. Run /research set-report-type Numbered list, with brief description of claim, fact-check, and URL of source - set your custom report format
  4. Run /research startover - to quickly rewrite the report using already fetched content
  5. (Optional) Run /re deeper N - “infinite” research feature to expand the number of sources 2^N-fold

Another new feature is the ability to ingest into an existing collection - useful if you want to supplement a KB with your local docs or specific websites.

Remember, if you don’t know how to do something or forgot a command, just tell the bot what you want to do, prefixed with /help, e.g.:

/help how can I ingest more content into a collection?
2 Likes

A new feature has been added: now you can create a shareable link to your collection. This opens up some cool possibilities, for example:

  1. You can create and share an instant support bot with a knowledge base from whatever docs you provide.
  2. You can collaborate on a research project with others.

To illustrate 1, I created a “talking resume” for myself in less than a minute, available at this share link. Here’s how you can do something like that:

  • Go to https://docdocgo.streamlit.app and run /in (or /ingest)
  • Use the uploader widget to browse to your docs (e.g. your resume PDF), select one or more, then click “Upload”.
  • Rename the resulting collection: /db rename my-talking-resume
  • Create a shareable link: /share editor pwd someRandomPwd123

You are done! You’ll get a shareable link that you can give to anyone, and they’ll be able to ask questions about your docs.

P.S. Currently the only option is to give editor access to your collection. Read-only sharing is coming soon.

2 Likes

Update: the P.S. above is no more, you can now share a collection in view-only mode. That way, you can, for example, create a support chatbot and share it with your customers, friends, or anyone else, without fear that anybody can “mess” with it.

All of the steps are exactly as in the previous post, except for the final step, where you can now use this command to create a shareable link with read-only access:

/share viewer pwd someRandomPwd4242

As an example, here’s a shareable link for a collection that gives the bot knowledge of hundreds of recent AI papers, based on ingesting Dair AI’s amazing ML Papers of the Week:

http://docdocgo.streamlit.app/?collection=u-yNp05g-ai-papers&access_code=godairai

Use this link and you will be able to chat with the bot about AI papers but not modify the collection in any way. To check your access level, you can use the command /db status. Finally, to find out about other sharing options, just type /share or, as usual, ask the bot itself with /help <your question>.

3 Likes

Some updates for both users and developers.

For users:

  • migrated database to AWS - which significantly sped up responses
  • reorganized and improved documentation - better responses when asking the bot for /help using it

For developers:

  • there’s now a standalone REST API server implemented with FastAPI (hosted in container on AWS)
  • there’s also a simple reference Next.js frontend to show how to interact with the API (live demo, code)
  • expanded and reorganized Developer Documentation
  • you can now ask the bot questions about building with DocDocGo by switching to the developer-docs collection
2 Likes

This app does most of what I want to do with RAG. Thanks for all the hard work writing the dev docs and sharing it! :100: :heavy_multiplication_x: :balloon:

2 Likes

Thank you @StreamlitTeam for your posts about DocDocGo on x.com and LinkedIn, I really appreciate it and hope more people give DDG a try and share their feedback - negative feedback is especially welcome since it will help me improve DDG!

Also, thank you to @asehmi for the many great suggestions over the last few weeks!

Announcing New Feature - /research heatseek

This is a different way to do web research - for when you need to find a site that has just the right answer/content. Sometimes the info you need isn’t immediately found on Google - “/research heatseek” can save you lots of time sifting through website after website. For example:

/re heatseek 3 Find code example showing how to update React state in shadcn Slider

This will kick off 3 iterations of research (each iteration goes through about 5 websites) looking for what you need. Here’s what it looks like:

Try it out and share your feedback!

1 Like

New feature fresh off the oven: /export your conversation history

You can now download your chat history (for the current session) as a Markdown file. You can even ingest it back into DocDocGo (which could be useful when doing /research).

To export your conversation, use the command:

  • /ex chat <number of past messages> (or /export instead of /ex)

If the number of past messages is not specified, the entire conversation will be exported.

Exporting collections is in the works!

P.S. Since there are a lot of commands in DDG, remember that it’s “self-aware”, i.e. you can always type /help how do I do X or any other question about DDG itself and it will explain and give you the options for relevant commands.

2 Likes

A few recent updates:

1. The addition of “i’m also a good gpt2 chatbot”

The “gpt-4o” model has been added to the list of available models - of course! :grin:

2. Option to search for the collection

With the increased number of users, there are now too many public collections for comfort to just list in alphabetical order, so now there’s another way:

  • /db list bla: list your collections whose names contain “bla”
  • /db list bla*: list collections whose names start with “bla”

3. Exporting chat history in reverse order

As @asehmi pointed out, in some cases it’s more useful to save the chat history in reverse chronological order. This can now be done with:

-/ex chat <optional number of latest messages> reverse

4. Updated docs.

Both the regular README and the Developer Guide have been updated in line with the newest features and with some additional improvements.

The new docs have been “fed” to DocDocGo to make sure it remains “self-aware” and can answer any usage-related questions (e.g. /help How to search for my collection). To ask a dev-related question, first switch to the “developer-docs” collection (/db use developer-docs) and then ask away!

P.S.

Personally, I have been using the /research heatseek mode more than the regular /research mode recently, it’s been a great help for me to quickly find specific short answers, with links to the sources in case I want more details.

2 Likes