Streamlit app to ingest documents to Pinecone to build AI-ready knowledge base

After having some difficulty in finding affordable no-code solution for ingesting documents to Pinecone vector database for RAG purpose, I decided to create one myself with the help of Gemini. This app chunks, embeds a wide variety of document types to vectors and upserts them to Pinecone. It uses semantic chunking strategy with metadata optimized for semantic querying to be used by AI. Because this uses some heavy-duty libraries (like Unstructured and SpaCy), I couldn’t get it to run on Community Cloud, so I’ve got it hosted over on Hugging Face Spaces. Check it out here: Pinecone Ingestor - a Hugging Face Space by Btran1291

*Pinecone and OpenAI API keys are required