Langchain stream

Display the streaming output from LangChain to Streamlit

from langchain.callbacks.base import BaseCallbackHandler
from langchain.chat_models import ChatOpenAI
from langchain.schema import HumanMessage
import streamlit as st

class StreamHandler(BaseCallbackHandler):
    def __init__(self, container, initial_text="", display_method='markdown'):
        self.container = container
        self.text = initial_text
        self.display_method = display_method

    def on_llm_new_token(self, token: str, **kwargs) -> None:
        self.text += token + "/"
        display_function = getattr(self.container, self.display_method, None)
        if display_function is not None:
            raise ValueError(f"Invalid display_method: {self.display_method}")

query = st.text_input("input your query", value="Tell me a joke")
ask_button = st.button("ask")

st.markdown("### streaming box")
chat_box = st.empty()
stream_handler = StreamHandler(chat_box, display_method='write')
chat = ChatOpenAI(max_tokens=25, streaming=True, callbacks=[stream_handler])

st.markdown("### together box")

if query and ask_button:
    response = chat([HumanMessage(content=query)])
    llm_response = response.content

Great. Thanks for sharing. Worked nicely.

1 Like

Hey @goldengrape ,
how do I stream the output of a Sequentialchain() that has two input variables ‘context’ and ‘query’ on Streamlit?

chat_llm = AzureChatOpenAI(max_tokens=25,
memory = ConversationBufferWindowMemory(memory_key=“chat_history”, k=15, input_key=“query”, output_key=“AIassistant”, )
context_chain = LLMChain(llm=chat_llm, prompt=context_prompt_template)
llm_chain = LLMChain(llm=chat_llm, prompt=prompt_template, output_key=“AIassistant”)
SequentialChain(chains=[context_chain, llm_chain],input_variables=[“context”,“query”], verbose=False, memory=st.session_state.entity_memory)
if user_input:
res = lang_chain_load_retrieve(user_input, FAISS_DATABASE, API_KEY)
context = “”
for y in range(len(res)):
context = context + “\n” + str(res[y])
response ={“context”: context, “query”: user_input})

Currently, I get ‘TypeError: Object of type StreamHandler is not JSON serializable’


In my experience, LangChain is a very complex HIGH LEVEL abstraction, and if you follow their example exactly, it’s easy to get good results, but if you try to modify something yourself, it often brings very complicated bugs because they hide too much information in it.

Just by looking at this part of your code, I have no idea what is happening. Also, I haven’t obtained the Azure OpenAI API key yet, so I cannot test AzureChatOpenAI either.

If I were to debug it, I think I would need to first test if the response is being properly outputted when streaming is set to False.

Instead of using Streamlit and a custom stream_handler, I suggest using langchain’s built-in StreamingStdOutCallbackHandler to check if the streaming output works correctly. Please refer to the following link for more information:

If everything mentioned above is working fine, I noticed that the error message states: “Object of type StreamHandler is not JSON serializable.” It’s possible that the information returned by the AI is in JSON format. In that case, you might need to extract a specific part of the JSON, such as the text or token, and then pass it to the StreamHandler for processing. You can refer to the “output parser” reference for guidance:

Or, if your entire program’s code is not very long, you may want to copy all the code along with the error messages into GPT-4 or Claude 100k and let GPT-4 do the debug.

In fact, I wrote this StreamHandler with the help of GPT-4. I gave GPT-4 the callback description page and let it come up with it. They are pretty good.


Now not only supports stream display, but also supports synchronized voice reading

Hey @goldengrape,
Thanks for your suggestion. I tried the langchain’s built-in StreamingStdOutCallbackHandler to check if the streaming output worked correctly. I was able to stream the response on the terminal. But, as mentioned early I was looking for a way to stream the output on Streamlit. I was able to do this by adopting a custom stream_handler (StreamlitCallbackHandler(BaseCallbackHandler)). Then I used a callback_manager to the LLM before running the SequentialChain().

class StreamlitCallbackHandler(BaseCallbackHandler):
def init(self, streamlit_text, res_box):
self.streamlit_text = streamlit_text
self.current_text = “”
self.res_box = res_box
self.llm_response_started = False

def on_llm_new_token(self, token: str, *args, **kwargs):  
    if not self.llm_response_started:  
        self.llm_response_started = True  
    self.current_text += token  

# Provide empty implementations for the required methods ... 

if user_input:
…the same code as shared earlier
res_box = st.empty()

start_response = time.time()

response_container = st.empty()  

text = random.choice(llm_pre_response_texts)  
callback_manager = CallbackManager([StreamlitCallbackHandler(response_container, res_box)])  
chat_llm.callback_manager = callback_manager  
llm_chain.callback_manager = callback_manager
overall_chain.callback_manager = callback_manager

print("Generating LLM response...")
response ={"context": context, "query": user_input})  
print("LLM response generated.")

@goldengrape Hi would this work with if i provide custom css to it. I actually was trying to implment a chatbot app. Where i was using GitHub - AI-Yash/st-chat: Streamlit Component, for a Chatbot UI this to create chat UI. But i have a hard time integarting streaming support into this. Can somebody please let me know a way ?

Hey, same here. Would me great to implement this streaming feature into streamlit_chat! Does anyone have an idea of how to do such a thing? All my previous attempts fail so far

With the latest (1.24) version of Streamlit streaming is possible, however ONLY for some special cases like OpenAI’s chat completion API. I am working on a streamlit app that uses LangChain RetrievalQAWithSourcesChain to answer questions from text documents.

Is there no possibility to add streaming with Streamlit + LangChain RetrievalQAWithSourcesChain ?


In the case of streaming how can I count the token usage and cost?

  1. Tried to extend OpenAICallbackHandler
def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Collect token usage."""
        if response.llm_output is None:
            return None

but after the streaming, response.llm_output is None.

  1. I also tried following, but it did not work.
with get_openai_callback() as cb:
    st_cb = StreamHandler(st.empty())
    response =, callbacks=[st_cb])

Could you please help?
Thanks in advance :blush:

hello using st.mardown(var) in place of doing st.write(var) do it