Streamlit Chatbot: Token Streaming

moxinator98 · December 19, 2023, 9:12am

Hi,

I created a Streamlit chatbot and now I want to enable token streaming.
I am loading a LLM with Langchain and LlamaCpp (from langchain.llms import LlamaCpp). At the moment, the output is only shown if the model has completed its generation, but I want it to be streamed, so the model generations are printed on the application (e.g. like in Chatgpt).

How can I achieve this? I am not using an OpenAI model, I use open-source models like Llama2.

daltunay · December 19, 2023, 1:31pm

I performed this using callbacks, see my code here:

github.com

daltunay/my-superapp/blob/main/src/generative_ai/large_language_models/callbacks.py

import streamlit as st
from langchain.callbacks.base import BaseCallbackHandler


class StreamingChatCallbackHandler(BaseCallbackHandler):
    def __init__(self):
        pass

    def on_llm_start(self, *args, **kwargs):
        self.container = st.empty()
        self.text = ""

    def on_llm_new_token(self, token: str, *args, **kwargs):
        self.text += token
        self.container.markdown(
            body=self.text,
            unsafe_allow_html=False,
        )

    def on_llm_end(self, response: str, *args, **kwargs):

This file has been truncated. show original

Code

import streamlit as st
from langchain.callbacks.base import BaseCallbackHandler


class StreamingChatCallbackHandler(BaseCallbackHandler):
    def __init__(self):
        pass

    def on_llm_start(self, *args, **kwargs):
        self.container = st.empty()
        self.text = ""

    def on_llm_new_token(self, token: str, *args, **kwargs):
        self.text += token
        self.container.markdown(
            body=self.text,
            unsafe_allow_html=False,
        )

    def on_llm_end(self, response: str, *args, **kwargs):
        self.container.markdown(
            body=response.generations[0][0].text,
            unsafe_allow_html=False,
        )

I pass this callback to my LangChain chain. I also set the parameter stream=True when instantiating the LLM, which is not supported by all models. See this code: my-superapp/src/generative_ai/large_language_models/chatbots/chatbot.py at main · daltunay/my-superapp · GitHub

Topic		Replies	Views
Langchain stream Show the Community! llms	11	16525	August 28, 2024
LangChain 🤝 Streamlit Show the Community! llms	1	2046	March 9, 2024
Wanted to Stream LLM response as it arrives to the streamlit application Using Streamlit discussion	4	644	April 21, 2025
Streamlit and LangChain Async Using Streamlit	1	3326	April 14, 2024
Questions about the new streaming feature LLMs and AI real-time	2	2339	July 10, 2023

Streamlit Chatbot: Token Streaming

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies