Streaming response Mistral Ai chatbot RAG

Lampropeltis · March 20, 2024, 2:27pm

Hello everyone.
I’m having trouble posting a streaming reply with my chatbot Mistral Ai.

The Mistral’ api give a working example (displaying live response in terminal).

client = MistralClient(api_key=MISTRAL_API_KEY)

messages = [ChatMessage(role="user", content="write python program to find prime numbers")]

stream_response = client.chat_stream(model=model, messages=messages)

for chunk in stream_response:
    print(chunk.choices[0].delta.content)

In my streamlit chatbot RAG app, this would be :


stream_response = client.chat_stream(
    model=model,
    messages=messages
)

response = "test"
for chunk in stream_response:
    st.write_stream(chunk.choices[0].delta.content)

but this gave me the following error :
streamlit.errors.StreamlitAPIException: st.write_stream expects a generator or stream-like object as input not <class ‘str’>. Please use st.write instead for this data type.

Has anyone ever used the mistrail ai api with a streaming response in a streamlit chatbot?

Thank you in advance !

Goyo · March 20, 2024, 8:09pm

What happens if you follow the advice included in the error message?

Lampropeltis · March 20, 2024, 10:15pm

Thats what I did.

The function client.chat_stream() gives a stream-like object, but it didn’t seems to be recognize as such by streamlit, hence my question.

The Mistral’API doc dont give any other function to display streaming response.

Goyo · March 20, 2024, 11:06pm

I don’t know how well passing that object to write_stream would work, but that is not what you are doing. Maybe you should try that too.

Lampropeltis · March 21, 2024, 8:36am

Yes, I’ve tried to do that but it doesn’t work, because to get a streaming response you have to iterate over the chunks returned by the object.

I don’t think it’s possible to do it as it is with Mistral Ai unfortunately…

Goyo · March 21, 2024, 9:34am

So what exactly is wrong with using st.write as the error message suggests?

Lampropeltis · March 21, 2024, 3:00pm

In my opinion, the error come from the object himself.

Open Ai’s function client.chat() have an argument stream = True and render one “streamable” object.

Mistral display a streaming response by looping in chunks given by client.chat_stream().

Therefore, the st.wrtie_streaming() cant receive a streming object from mistral… because it doesnt gave it.

For the moment, I’m getting round the problem by display a non-streming response.

But it is less visually captiving

Goyo · March 21, 2024, 3:13pm

You mean this?

for chunk in stream_response:
    st.write(chunk.choices[0].delta.content)

Lampropeltis · March 21, 2024, 3:31pm

Unfortunately no, this gave me one error per token ^^
I don’t understant why this solution doesn’t work.

Goyo · March 21, 2024, 6:09pm

I don’t understand it either. What does “one error per token” means?"

Lampropeltis · March 25, 2024, 4:04pm

To display the streamed response, Mistral’s api passes through a for loop that iterates over all the chunks generated by the client.chat_stream() object.

There are therefore hundreds of chunks for one response.

As there are also hundreds of errors generated when I go through the following solution :

stream_response = client.chat_stream(model, messages)
for chunk in stream_response:
    st.write(chunk.choices[0].delta.content)

I deduced that Streamlit was unable to display a streaming response from the Mistral api.

Lampropeltis · March 26, 2024, 2:18pm

Finally works !!

Here is the final code. Huge thanks to @Intelligent_Bit3942 for his working exemple that i adapted for my case.

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage
import streamlit as st
import json
import faiss
import numpy as np

model = "open-mixtral-8x7b"
mistral_api_key = st.secrets["MISTRAL_API_KEY"]
client = MistralClient(api_key=mistral_api_key)

st.title("Assistant ChatBot catalogue 2024")

def load_json(rep:str):
    f = open(rep, encoding='UTF-8')
    return json.load(f)

def split_chunk(data, chunk_size):
    data_str = [json.dumps(entry) for entry in data]
    chunk_size = chunk_size
    chunks = [data_str[i:i + chunk_size] for i in range(0, len(data_str), chunk_size)]
    print(f"Nb. chunks = {len(chunks)}")
    return chunks
    
def get_text_embedding(input):
    embeddings_batch_response = client.embeddings(
          model='mistral-embed',
          input=input
      )
    return embeddings_batch_response.data[0].embedding

def load_vector_db(text_embedded):
    d = text_embedded.shape[1]
    index = faiss.IndexFlatL2(d)
    index.add(text_embedded)
    return index

def find_similar_chunk(index, question_embeddings, chunks):
    D, I = index.search(question_embeddings, k=2) # distance, index
    return [chunks[i] for i in I.tolist()[0]]

def prompt_chat(retrieved_chunk, question):
    return f"""
    Les informations contextuelles sont les suivantes.
    ---------------------
    {retrieved_chunk}
    ---------------------
    Compte tenu des informations contextuelles et sans connaissances préalables,
    réponds en français à la question suivante de manière concise.
    Utilise des listes pour plus de lisibilité. 
    Question: {question}
    Réponse:
    """

# Chargement des données
data = load_json('catalogue_2024.json')
chunks = split_chunk(data, 3)
text_embeddings = np.load("catalogue_embeddings.npy")
index = load_vector_db(text_embeddings)

if "messages" not in st.session_state:
    st.session_state["messages"] = [{"role": "assistant", "content": "Comment puis-je vous aider?"}]
    st.session_state["History"] = []
    st.session_state.History.append(ChatMessage(role="assitant", content="Comment puis-je vous aider?"))

for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

if prompt := st.chat_input():
    question_embeddings = np.array([get_text_embedding(prompt)])
    retrieved_chunk = find_similar_chunk(index, question_embeddings, chunks)
    p = prompt_chat(retrieved_chunk=retrieved_chunk, question=prompt)

    st.session_state.messages.append({"role": "user", "content": prompt})
    st.session_state.History.append(ChatMessage(role="user", content=p))
    st.chat_message("user").write(prompt)

    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        full_response = ""
        for response in client.chat_stream(
            model=model,
            messages=st.session_state.History[1:]
        ):
            full_response += (response.choices[0].delta.content or "")
            message_placeholder.markdown(full_response + "|")
        
        message_placeholder.markdown(full_response)
        
        st.session_state.History.append(ChatMessage(role="assistant", content=full_response))
        st.session_state.messages.append({"role": "assistant", "content": full_response})

system · March 28, 2024, 2:19pm

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Streamlit + local Mistral 7B v0.2 = streaming answers, is that possible? Using Streamlit llms	3	1987	August 26, 2024
St.write_stream, writing the answers in logs Using Streamlit debugging	2	1834	August 14, 2024
Streaming response line chatgpt LLMs and AI real-time	1	1545	June 12, 2023
About theresponse = st.write_stream(stream) LLMs and AI session-state , debugging , chatbot	0	348	May 7, 2024
The response from API is not updated in the st.text_area Using Streamlit session-state , debugging	3	270	March 31, 2024

Streaming response Mistral Ai chatbot RAG

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies