Streamlit Chatbot: Token Streaming

Hi,

I created a Streamlit chatbot and now I want to enable token streaming.
I am loading a LLM with Langchain and LlamaCpp (from langchain.llms import LlamaCpp). At the moment, the output is only shown if the model has completed its generation, but I want it to be streamed, so the model generations are printed on the application (e.g. like in Chatgpt).

How can I achieve this? I am not using an OpenAI model, I use open-source models like Llama2.

I performed this using callbacks, see my code here:

Code
import streamlit as st
from langchain.callbacks.base import BaseCallbackHandler


class StreamingChatCallbackHandler(BaseCallbackHandler):
    def __init__(self):
        pass

    def on_llm_start(self, *args, **kwargs):
        self.container = st.empty()
        self.text = ""

    def on_llm_new_token(self, token: str, *args, **kwargs):
        self.text += token
        self.container.markdown(
            body=self.text,
            unsafe_allow_html=False,
        )

    def on_llm_end(self, response: str, *args, **kwargs):
        self.container.markdown(
            body=response.generations[0][0].text,
            unsafe_allow_html=False,
        )

I pass this callback to my LangChain chain. I also set the parameter stream=True when instantiating the LLM, which is not supported by all models. See this code: my-superapp/src/generative_ai/large_language_models/chatbots/chatbot.py at main · daltunay/my-superapp · GitHub