🚀 RAGxplorer - Explore the embeddings of your RAG Documents! (GPT 4 + ChromaDB + Sentence Transformers)

gabrielc · January 11, 2024, 4:09pm

Meet RAGxplorer - an interactive tool to visualize the retrieval techniques and diagnose which chunks are being retrieved. I hope this application is useful especially for those who are learning RAGs and exploring the embedding space of your documents.

Application

With RAGxplorer, you can:

Upload your own own PDF
Configure the chunk size and chunk overlap
Visualise where their query is in the embedding space, and the chunks that are a top-k match
[NEW!] Experiment with Query Expansion techniques (e.g. multi-questions, hypothetical answer/HyDE)

Code is here:

Any feedback would be most appreciated!

ai-builder · January 11, 2024, 4:25pm

Very cool! How did you store your API key?

gabrielc · January 11, 2024, 4:28pm

Thanks!

To clarify, this version is not using any APIs yet. For now, the embedding model is hosted within the streamlit app & the vector db is in memory.

The vector db is chroma - which is great.

on api keys, you may find this blog post useful: 8 tips for securely using API keys

bidihi4671 · January 12, 2024, 1:19am

Good job!

The course looks very interesting too - thanks for sharing.

JacksonChin · January 14, 2024, 1:36am

this is useful! hope to see the “bring your own embeddings” implemented soon!

Chinzzzz · January 14, 2024, 9:13am

great name, and great app!

gabrielc · January 14, 2024, 11:19am

Coming real soon!

tim_c · January 14, 2024, 2:35pm

Does it support word doc?

gabrielc · January 15, 2024, 3:44am

Nope! The app only takes in a PDF (and it has to be a PDF with text, not a scanned text)

gabrielc · January 16, 2024, 4:37pm

New update!

I also re-factored the code entirely for more modularity. The code is also now released under MIT License - please feel free to fork it.

RaymondNg · January 17, 2024, 1:32am

Thanks for sharing your work!

WenJie1997 · January 17, 2024, 11:28am

Great repo! the code’s more modular compared to most streamlit apps i’ve seen

gabrielc · January 17, 2024, 1:09pm

New features - text-embedding-ada-002 and gte-large embedding models!

Guna_Sekhar_Venkata · January 18, 2024, 1:26pm

@JacksonChin presently I’m working with my mother language tokenization model for creating my own embeddings

gabrielc · January 18, 2024, 1:47pm

Nice! More multilingual tokenizers and language/embedding models will be great = )

gabrielc · January 18, 2024, 5:16pm

To share more about what’s going on under the hood, the app uses umap to project these high dimensional embeddings into a 2D space for visualization.

https://umap-learn.readthedocs.io/en/latest/#

gabrielc · January 20, 2024, 12:24am

Once, again thank you to the streamlit team for hosting this competition.

Here’s the repo I hope this helps folks building RAG applications!

For now, the best experience is to clone the repo and run this locally!

The vector database is an in-memory one (Chroma), and I suspect many people using it, or quitting the app while the vector database is being built, causes some issues. Some tweaks to the control flow and deployment may resolve that.

Any feedback for improvement would be most appreciated. I will still continue to work on this.

system · January 22, 2024, 12:25am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
🔎 [June 27] Optimize RAG with hybrid search and reranking Events events	0	145	June 21, 2024
Querying PDFs using RAG Using Streamlit llms , build-with-streamlit , debugging	2	1002	October 29, 2024
Streamlit app : Insightful Data Explorer Show the Community!	11	1383	January 25, 2025
New App! Document Highlight Extractor: Streamline Your Research Process! Show the Community!	1	111	April 13, 2025
ThoughtScope AI Show the Community! streamlit-cloud	0	70	January 9, 2025

🚀 RAGxplorer - Explore the embeddings of your RAG Documents! (GPT 4 + ChromaDB + Sentence Transformers)

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies