Hi Dimitry @ReasonMeThis,
This is very nice indeed… I could feel it in my bones that you were about to release something cool! I have the same issue of an ever-expanding list of personal collections so will appreciate the collection name search. I love the chat export capability!
The dev guide is super-useful… which reminds me I ought to finish my “chat with DDG using email” prototype, that leverages DDG’s API.
Now that you have Omni in there, I assume at some point it might be possible to support multi-media (multimodal) content ingest.
A simple use case example is: upload an image PDF or picture and send to gpt-4o
to extract text or a description of the image; then vectorize this into a DDG collection. Admittedly, pure text extraction from images can be done with OCR tooling, but the more general “chat with a picture” would be a super-interesting feature. E.g. the feature could be used to upload a screenshot of an Excel chart or any data chart prior to having a conversation with it.
I don’t think many changes would be required in DDG to be honest. You just need to distinguish between image and text docs (unless LangChain will do that), and submit the b64-encoded image to gpt-4o
with a custom prompt to analyze the doc and extract text and data.
@FareedKhan-dev posted this “chat with graph” app a few months ago. It uses Gemini, but is an example of chatting with a data chart (i.e. a graph).
@Charly_Wargnier delved into some other use cases for MM LLMs like Omni.
Thanks for regularly updating DocDocGo… I use my own version of it all the time and try to merge your changes ASAP.
Cheers,
Arvindra