Grounded multi-doc Q&A made simple with AI21

streamlitbot · April 3, 2024, 5:36pm

In just a few steps, build a context-based question-answering app based on your own documents and powered by AI21’s RAG Engine and task-specific models

By Robbin Jang and Chen Wang

Posted in Tutorials, April 3 2024

At AI21, we’re working with developers across industries to power their question-answering apps with our task-specific models and RAG Engine. Most developers want to build apps that provide answers grounded in a specific body of knowledge, such as their company’s documentation. This approach is called Retrieval Augmented Generation (RAG). RAG reduces hallucinations by keeping answers context-specific.

Building a RAG-based Q&A app can be complicated. Developers need to complete multiple steps to stitch together a RAG pipeline, including:

Connecting to a knowledge base of your documents
Segmenting and embed those documents into a vector store
Setting up semantic search capabilities for sending queries to the vector store
Retrieving the relevant context from the vector store and pass it to an LLM for augmented response generation

AI21’s RAG Engine and Contextual Answers task-specific model (TSM) takes care of these steps. The RAG Engine allows developers to quickly deploy a performant, scalable RAG pipeline with no manual setup. Thanks to the Contextual Answers TSM, responses are grounded and accurate, and the cost is significantly lower than leading foundation models.

For developers looking to build production-ready RAG, this post will walk you through creating your own Q&A app powered by AI21’s RAG Engine and Contextual Answers TSM. A sandbox version of the app is available here.

In this post, we will:

Introduce the concept of a Task-Specific Model (TSM) and evaluate how AI21’s Contextual Answers TSM compares to other models for RAG-based Q&A
Explain how our RAG Engine provides a simple but powerful solution for deploying a RAG pipeline
Give you step-by-step instructions to build and run your own Q&A application powered by AI21

What is a TSM?

AI21’s Task-Specific Models (TSMs) are highly specialized small language models trained to excel in specific use cases. TSMs focus on common use cases like grounded Q&A, summarization, and text editing.

The Contextual Answers TSM offers multiple advantages for RAG Q&A apps:

Higher accuracy in specialized Q&A

The Contextual Answers TSM has a very low hallucination rate, especially compared to other foundation models. It is very good at spotting irrelevancy between the context and the question, and returning “Answers are not in the documents” instead of hallucinating an answer. Less prompt engineering is needed, which results in a better user experience.

The Contextual Answers TSM resulted in a 68% rate of “good” answers.

In fact, the Contextual Answers TSM outperformed Claude 2.1 in testing, with a 12% higher rate of “Good” answers. “Good” answers were defined as factually correct, fluent, and comprehensive.

Increased safety

Since the Contextual Answers TSM is trained for generating grounded responses, it’s more secure against prompt injection and other abuse from bad actors.

Lower latency and cost compared to other foundation models

Using foundation models like GPT-4 at scale can be extremely expensive. The Contextual Answers TSM is significantly cheaper to use and scale.

What is AI21’s RAG Engine?

Building a RAG pipeline may seem easy, but scaling your pipeline while maintaining performance is difficult. When developers use a variety of different models and processes, orchestration is challenging. Multiple areas impact RAG performance, including:

Efficiency of chunking and embedding processes
Support for multiple file types
Scaling as data volume increases

AI21's RAG Engine offers a scalable all-in-one solution accessible via API. The Contextual Answers TSM seamlessly plugs into the RAG Engine, creating an end-to-end pipeline.

Learn more about the RAG Engine in this blog post.

Architecture Diagram

You can check out a hosted version of the app on Streamlit Community Cloud here. Read on to learn how you can build you own.

Build your own Q&A app powered by a task-specific small language model

This app allows you to upload a PDF or .txt file and ask questions to your uploaded document as a knowledge base.

To build and run your own version of our application locally, clone the GitHub repo and use your AI21 credentials.

Request your AI21 Studio account: https://studio.ai21.com/login
Locate your AI21 API Key: https://studio.ai21.com/account/api-key
Create secrets.toml in a .streamlit folder and add your credentials, replacing <YOUR_API_KEY> with your AI21 API Key:

[api-keys]
ai21-api-key = "<YOUR_API_KEY>"

Install all dependencies listed in the requirements.txt, including AI21 Labs Python SDK
Run [Welcome.py](<http://Welcome.py>) by entering streamlit run Welcome.py in your console
The Welcome page of the AI21 Studio Streamlit app should be opened in your browser once you run the command above
Navigate to the Multi Document Q&A page

Breaking down the app

Import AI21 library and load credentials:

# import AI21 SDK library
from ai21 import AI21Client
# initiate AI21 client with your API key
client = AI21Client(api_key=st.secrets['api-keys']['ai21-api-key'])

Upload files to AI21 Document Library with document labels (optional):

# Streamlit file uploader widget
uploaded_files = st.file_uploader("choose .pdf/.txt file",
   accept_multiple_files=True,
   type=["pdf", "text", "txt"],
   key="a")
# upload the files to AI21 RAG Engine library
for uploaded_file in uploaded_files:
   client.library.files.create(file_path=uploaded_file, labels=label)

Ask questions against the uploaded documents with AI21 Contextual Answers model:

# Streamlit text input widget 
question = st.text_input(label="Question:", value="Your Question")
# send your question to AI21 Contextual Answers LLM and getting back the response
response = client.library.answer.create(question=question, label=label)

Wrapping up

We hope this guide makes it straightforward to build a grounded, contextual Q&A application. With this app, users can interact with complex bodies of knowledge to improve workflows like customer support, contract analysis, product optimization, and more.

For Streamlit users, AI21 is excited to offer an extended Starter Bundle that provides free credits for our AI21 Studio playground and a guided 1:1 session to discuss your use case and help you get started.

If you have any questions, please post them in the comments below, reach out to studio@ai21.com, or create a GitHub Issue.

This is a companion discussion topic for the original entry at https://blog.streamlit.io/ai21_grounded_multi_doc_q-a

asehmi · April 6, 2024, 1:16pm

Cool, I will have a look at this.
P.S. I have been using the logo “A12i” in my apps for a while now… unfortunate that it’s visually similar to “AI21”. (“A12i” == “Arvindra Sehmi”)

smach · April 7, 2024, 1:45pm

Can this app include citations and links to the specific document chunks used in the LLM’s answer? I couldn’t tell from the demo.
It’s a must for me that my users be able to easily check response accuracy by looking at source text used to generate responses. If they have to check the entire document themselves, the app is much less useful… (This is why I end up building my own apps instead of using a lot of otherwise elegant project templates)

system · October 4, 2024, 1:46pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
New noorbasha-git/rag_system Streamlit App Show the Community! streamlit-cloud , ai , rag	2	18	November 27, 2025
🚀 RAGxplorer - Explore the embeddings of your RAG Documents! (GPT 4 + ChromaDB + Sentence Transformers) Show the Community! session-state , file-upload , plotly , streamlit-cloud , llms , openai , build-with-streamlit	17	5299	January 20, 2024
Multi-Modal RAG ChatBot: Your AI-Powered Knowledge Assistant (Streamlit + MindsDB + LangChain + FAISS) Show the Community! streamlit-cloud , discussion , faiss , mindsdb	9	2378	January 11, 2025
A production-style RAG system with explicit routing between tool-based agents, vector retrieval, and general LLM inference. Show the Community! streamlit-cloud , llms , build-with-streamlit , search , chatbot , rag	0	57	December 18, 2025
Streamlit app to ingest documents to Pinecone to build AI-ready knowledge base Show the Community! openai , rag	0	24	January 9, 2026