Streaming response Mistral Ai chatbot RAG

Hello everyone.
I’m having trouble posting a streaming reply with my chatbot Mistral Ai.

The Mistral’ api give a working example (displaying live response in terminal).

client = MistralClient(api_key=MISTRAL_API_KEY)

messages = [ChatMessage(role="user", content="write python program to find prime numbers")]

stream_response = client.chat_stream(model=model, messages=messages)

for chunk in stream_response:
    print(chunk.choices[0].delta.content)

In my streamlit chatbot RAG app, this would be :


stream_response = client.chat_stream(
    model=model,
    messages=messages
)

response = "test"
for chunk in stream_response:
    st.write_stream(chunk.choices[0].delta.content)

but this gave me the following error :
streamlit.errors.StreamlitAPIException: st.write_stream expects a generator or stream-like object as input not <class ‘str’>. Please use st.write instead for this data type.

Has anyone ever used the mistrail ai api with a streaming response in a streamlit chatbot?

Thank you in advance !

1 Like

What happens if you follow the advice included in the error message?

Thats what I did.

The function client.chat_stream() gives a stream-like object, but it didn’t seems to be recognize as such by streamlit, hence my question.

The Mistral’API doc dont give any other function to display streaming response.

I don’t know how well passing that object to write_stream would work, but that is not what you are doing. Maybe you should try that too.

Yes, I’ve tried to do that but it doesn’t work, because to get a streaming response you have to iterate over the chunks returned by the object.

I don’t think it’s possible to do it as it is with Mistral Ai unfortunately…

So what exactly is wrong with using st.write as the error message suggests?

In my opinion, the error come from the object himself.

Open Ai’s function client.chat() have an argument stream = True and render one “streamable” object.

Mistral display a streaming response by looping in chunks given by client.chat_stream().

Therefore, the st.wrtie_streaming() cant receive a streming object from mistral… because it doesnt gave it.

For the moment, I’m getting round the problem by display a non-streming response.

But it is less visually captiving

You mean this?

for chunk in stream_response:
    st.write(chunk.choices[0].delta.content)

Unfortunately no, this gave me one error per token ^^
I don’t understant why this solution doesn’t work.

I don’t understand it either. What does “one error per token” means?"

To display the streamed response, Mistral’s api passes through a for loop that iterates over all the chunks generated by the client.chat_stream() object.

There are therefore hundreds of chunks for one response.

As there are also hundreds of errors generated when I go through the following solution :

stream_response = client.chat_stream(model, messages)
for chunk in stream_response:
    st.write(chunk.choices[0].delta.content)

I deduced that Streamlit was unable to display a streaming response from the Mistral api.

Finally works !!

Here is the final code. Huge thanks to @Intelligent_Bit3942 for his working exemple that i adapted for my case.

from mistralai.client import MistralClient
from mistralai.models.chat_completion import ChatMessage
import streamlit as st
import json
import faiss
import numpy as np

model = "open-mixtral-8x7b"
mistral_api_key = st.secrets["MISTRAL_API_KEY"]
client = MistralClient(api_key=mistral_api_key)

st.title("Assistant ChatBot catalogue 2024")

def load_json(rep:str):
    f = open(rep, encoding='UTF-8')
    return json.load(f)

def split_chunk(data, chunk_size):
    data_str = [json.dumps(entry) for entry in data]
    chunk_size = chunk_size
    chunks = [data_str[i:i + chunk_size] for i in range(0, len(data_str), chunk_size)]
    print(f"Nb. chunks = {len(chunks)}")
    return chunks
    
def get_text_embedding(input):
    embeddings_batch_response = client.embeddings(
          model='mistral-embed',
          input=input
      )
    return embeddings_batch_response.data[0].embedding

def load_vector_db(text_embedded):
    d = text_embedded.shape[1]
    index = faiss.IndexFlatL2(d)
    index.add(text_embedded)
    return index

def find_similar_chunk(index, question_embeddings, chunks):
    D, I = index.search(question_embeddings, k=2) # distance, index
    return [chunks[i] for i in I.tolist()[0]]

def prompt_chat(retrieved_chunk, question):
    return f"""
    Les informations contextuelles sont les suivantes.
    ---------------------
    {retrieved_chunk}
    ---------------------
    Compte tenu des informations contextuelles et sans connaissances préalables,
    réponds en français à la question suivante de manière concise.
    Utilise des listes pour plus de lisibilité. 
    Question: {question}
    Réponse:
    """

# Chargement des données
data = load_json('catalogue_2024.json')
chunks = split_chunk(data, 3)
text_embeddings = np.load("catalogue_embeddings.npy")
index = load_vector_db(text_embeddings)

if "messages" not in st.session_state:
    st.session_state["messages"] = [{"role": "assistant", "content": "Comment puis-je vous aider?"}]
    st.session_state["History"] = []
    st.session_state.History.append(ChatMessage(role="assitant", content="Comment puis-je vous aider?"))

for msg in st.session_state.messages:
    st.chat_message(msg["role"]).write(msg["content"])

if prompt := st.chat_input():
    question_embeddings = np.array([get_text_embedding(prompt)])
    retrieved_chunk = find_similar_chunk(index, question_embeddings, chunks)
    p = prompt_chat(retrieved_chunk=retrieved_chunk, question=prompt)

    st.session_state.messages.append({"role": "user", "content": prompt})
    st.session_state.History.append(ChatMessage(role="user", content=p))
    st.chat_message("user").write(prompt)

    with st.chat_message("assistant"):
        message_placeholder = st.empty()
        full_response = ""
        for response in client.chat_stream(
            model=model,
            messages=st.session_state.History[1:]
        ):
            full_response += (response.choices[0].delta.content or "")
            message_placeholder.markdown(full_response + "|")
        
        message_placeholder.markdown(full_response)
        
        st.session_state.History.append(ChatMessage(role="assistant", content=full_response))
        st.session_state.messages.append({"role": "assistant", "content": full_response})

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.