ChromaDB Knowledge Base Not Persisting in Streamlit Cloud Deployment

My RAG Chatbot works perfectly in localhost but loses all knowledge when deployed to Streamlit Cloud. The chatbot responds for questions that work fine locally.

What I’ve Tried

  1. Checking if ChromaDB initializes in deployment:
if os.environ.get('STREAMLIT_SHARING_MODE') == 'streamlit_app':
    logger.info("Detected Streamlit deployment environment")
    if os.path.exists(persist_directory):
        # Backup and recreate logic
  1. Verifying all dependencies are correct in requirements.txt:
chromadb==0.4.22
langchain-chroma==0.1.0
pysqlite3-binary>=0.5.0

Questions

  1. Where should I store the ChromaDB files in Streamlit Cloud for persistence?
  2. Is there a way to initialize the knowledge base during deployment?

Relevant Code

The key part handling ChromaDB initialization:

# From retriever.py
try:
    vectorstore = Chroma(
        embedding_function=embeddings,
        persist_directory=persist_directory,
        client_settings=client_settings
    )
except Exception as e:
    logger.warning(f"Failed to initialize Chroma with existing DB: {e}")

Environment

  • Local: Works perfectly
  • Deployment: Streamlit Cloud
  • Python version: 3.9
  • ChromaDB version: 0.4.22
  • OS: MacOS Sequoia 15.3.1

Any help would be greatly appreciated! :pray: