Build Accurate MultiModal Search for Slides Using Hybrid Indexes

Hi Streamlit Community,

I’m sharing a showcase project on building a multimodal search service using GPT-4o, Pathway, and Streamlit, featuring metadata extraction and hybrid indexing. The approach used helps in accurately retrieving relevant information from presentations and PDFs.

Project Link:

How it Works:

The architecture of the Slides AI Search App is designed to connect various local or cloud repositories, transforming and indexing slides for efficient querying or Mutlimodal RAG use-cases. It supports integration with closed and open-source LLMs for enhanced search capabilities. Here it uses OpenAI’s GPT-4o model.

This resource demonstrates how to build an accurate search, which can also be extended to multimodal RAG pipelines, powered by hybrid indexes. The key additional benefit is simplifying ETL processes and automatically updating indexes as changes occur in your presentation repository.

Data Ingestion:
The application reads slide files (PPTX and PDF) from local directories or integrates with Google Drive and Microsoft SharePoint.

Parsing and Indexing:

  • Parsing: Uses Pathway’s SlideParser with a detailed schema (parse_schema.yaml). It parses images, charts, diagrams, and other visual elements and extracts unstructured metadata such as category, tags, title, main color, language, and presence of images.
  • Indexing: Embedded slide content is stored in Pathway’s vector store, optimized for incremental indexing and integrated with OpenAI’s embedder.

UI:
The UI component is simple, showing the status of indexes updated in real-time, built with the help of Streamlit.

Advantages:

  • Automated Index Updates: Indexes update automatically whenever slides are added, modified, or removed, ensuring the most current information is always available.
  • Improved Efficiency: Quickly find specific information without manually searching through numerous presentations, ideal for preparing presentations or reviewing past projects.
  • Enhanced Organization: Easily categorize and organize slides by topic, project, or other criteria.

Running the Application:
The application runs efficiently as a containerized solution using Docker, ensuring a consistent environment and simplifying deployment.

Looking forward to your questions and feedback!

5 Likes