Hi,
I created a Streamlit chatbot and now I want to enable token streaming.
I am loading a LLM with Langchain and LlamaCpp (from langchain.llms import LlamaCpp). At the moment, the output is only shown if the model has completed its generation, but I want it to be streamed, so the model generations are printed on the application (e.g. like in Chatgpt).
How can I achieve this? I am not using an OpenAI model, I use open-source models like Llama2.
1 Like
I performed this using callbacks, see my code here:
import streamlit as st
from langchain.callbacks.base import BaseCallbackHandler
class StreamingChatCallbackHandler(BaseCallbackHandler):
def __init__(self):
pass
def on_llm_start(self, *args, **kwargs):
self.container = st.empty()
self.text = ""
def on_llm_new_token(self, token: str, *args, **kwargs):
self.text += token
self.container.markdown(
body=self.text,
unsafe_allow_html=False,
)
def on_llm_end(self, response: str, *args, **kwargs):
This file has been truncated. show original
Code
import streamlit as st
from langchain.callbacks.base import BaseCallbackHandler
class StreamingChatCallbackHandler(BaseCallbackHandler):
def __init__(self):
pass
def on_llm_start(self, *args, **kwargs):
self.container = st.empty()
self.text = ""
def on_llm_new_token(self, token: str, *args, **kwargs):
self.text += token
self.container.markdown(
body=self.text,
unsafe_allow_html=False,
)
def on_llm_end(self, response: str, *args, **kwargs):
self.container.markdown(
body=response.generations[0][0].text,
unsafe_allow_html=False,
)
I pass this callback to my LangChain chain. I also set the parameter stream=True when instantiating the LLM, which is not supported by all models. See this code: my-superapp/src/generative_ai/large_language_models/chatbots/chatbot.py at main · daltunay/my-superapp · GitHub