Streaming in chat but without typewriter effect

I’m using the latest streamlit.

I made an app to chat to a chatbot with the OpenAI API. I stream the output like this which works fine with the typewriter effect:

            with st.chat_message("assistant"):
                stream = self.client.chat.completions.create(
                    model=self.model,
                    messages=messages,
                    stream=True,
                )
                response = st.write_stream(stream)

But it has some issues that when I deploy the app makes users complain. One of them is that sometimes the format of the output is a bit broken, this is not a huge deal cause the problems are minimal but would be nice to avoid. The other is that, for some reason, the server where the app is deployed makes the chat interface flicker with each character it writes. It gets super annoying when the output is long because the chat area keeps going up and down with each character and I have to look away while the answer completes to make it tolerable.

So I decided that it’s better not to have the typewriter effect at all to avoid this but I still need some type of streaming because otherwise the user has to sit there waiting until the whole answer goes through and it could take long.

So I’m asking if someone here had any experience streaming differently, like concatenating several chunks without losing formatting of the answers? For example I tried concatenating chunks until a new line character is found but some stuff loses format like code. Has anyone implemented a different type of streaming that doesn’t affect formatting of the output?

1 Like

Hi @Odrec

Instead of st.write_stream() you could replace that with st.markdown().

1 Like

Hi @dataprofessor , thanks for the help! I tried that

            with st.chat_message("assistant"):
                stream = self.client.chat.completions.create(
                    model=self.model,
                    messages=messages,
                    stream=True,
                )
                response = st.markdown(stream)

But then I get this error

2024-02-24 13:49:50.629 Uncaught app exception
Traceback (most recent call last):
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
  File "PycharmProjects/ai-portal/pages/chatbot_app.py", line 49, in <module>
    app.run()
  File "PycharmProjects/ai-portal/pages/chatbot_app.py", line 42, in run
    self.initialize_app()
  File "PycharmProjects/ai-portal/pages/chatbot_app.py", line 38, in initialize_app
    self.chat_manager.display_chat_interface()
  File "PycharmProjects/ai-portal/src/chatbot_utils.py", line 277, in display_chat_interface
    self._display_conversation()
  File "PycharmProjects/ai-portal/src/chatbot_utils.py", line 227, in _display_conversation
    st.chat_message("assistant").write(message)
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/runtime/metrics_util.py", line 397, in wrapped_func
    result = non_optional_func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/elements/write.py", line 441, in write
    repr_html = arg._repr_html_()
                ^^^^^^^^^^^^^^^^^
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/delta_generator.py", line 352, in wrapper
    raise StreamlitAPIException(message)
streamlit.errors.StreamlitAPIException: `_repr_html_()` is not a valid Streamlit command.
2024-02-24 13:49:50.848 Uncaught app exception
Traceback (most recent call last):
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 535, in _run_script
    exec(code, module.__dict__)
  File "PycharmProjects/ai-portal/pages/chatbot_app.py", line 49, in <module>
    app.run()
  File "PycharmProjects/ai-portal/pages/chatbot_app.py", line 42, in run
    self.initialize_app()
  File "PycharmProjects/ai-portal/pages/chatbot_app.py", line 38, in initialize_app
    self.chat_manager.display_chat_interface()
  File "PycharmProjects/ai-portal/src/chatbot_utils.py", line 277, in display_chat_interface
    self._display_conversation()
  File "PycharmProjects/ai-portal/src/chatbot_utils.py", line 227, in _display_conversation
    st.chat_message("assistant").write(message)
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/runtime/metrics_util.py", line 397, in wrapped_func
    result = non_optional_func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/elements/write.py", line 441, in write
    repr_html = arg._repr_html_()
                ^^^^^^^^^^^^^^^^^
  File "PycharmProjects/ai-portal/.venv311/lib/python3.11/site-packages/streamlit/delta_generator.py", line 352, in wrapper
    raise StreamlitAPIException(message)
streamlit.errors.StreamlitAPIException: `_repr_html_()` is not a valid Streamlit command.

and if I try doing something like this:

            response = ""
            partial_response = ""
            c = 0
            with st.chat_message("assistant"):
                stream = self.client.chat.completions.create(
                    model=self.model,
                    messages=messages,
                    stream=True,
                )
                for chunk in stream:
                    c += 1
                    chunk_text = chunk.choices[0].delta.content
                    if chunk_text:
                        partial_response += chunk.choices[0].delta.content
                    if (partial_response and c == 20) or chunk_text is None:
                        response += partial_response
                        st.markdown(partial_response)
                        c = 0
                        partial_response = ""

All the formatting gets messed up while the stream is taking place

1 Like

Hi,
When using st.markdown() could you also turn off streaming by using stream=False

1 Like

The solution was a bit complex but this is how we managed to do streaming without typewriter effect in the end. Now the deployed version looks much better

    def generate_response(stream):
        """
        Extracts the content from the stream of responses from the OpenAI API.
        Parameters:
            stream: The stream of responses from the OpenAI API.

        """

        for chunk in stream:
            delta = chunk.choices[0].delta
            if delta:
                chunk_content = chunk.choices[0].delta.content
                yield chunk_content

    @staticmethod
    def concatenate_partial_response(partial_response):
        """
        Concatenates the partial response into a single string.

        Parameters:
            partial_response (list): The chunks of the response from the OpenAI API.

        Returns:
            str: The concatenated response.
        """
        str_response = ""
        for i in partial_response:
            if isinstance(i, str):
                str_response += i

        st.markdown(str_response)

        return str_response

    def get_response(self, prompt, description_to_use):
        """
        Sends a prompt to the OpenAI API and returns the API's response.

        Parameters:
            prompt (str): The user's message or question.
            description_to_use (str): Additional context or instructions to provide to the model.

        Returns:
            str: The response from the chatbot.
        """
        try:
            # Prepare the full prompt and messages with context or instructions
            messages = self._prepare_full_prompt_and_messages(prompt, description_to_use)

            # Send the request to the OpenAI API
            # Display assistant response in chat message container
            response = ""
            with st.chat_message("assistant"):
                stream = self.client.chat.completions.create(
                    model=self.model,
                    messages=messages,
                    stream=True,
                )
                partial_response = []
                code_block = False

                gen_stream = self.generate_response(stream)
                for chunk_content in gen_stream:
                    # check if the chunk is a code block
                    if chunk_content == '```':
                        partial_response.append(chunk_content)
                        code_block = True
                        while code_block:
                            try:
                                chunk_content = next(gen_stream)
                                partial_response.append(chunk_content)
                                if chunk_content == "`\n\n":
                                    code_block = False
                                    str_response = self.concatenate_partial_response(partial_response)
                                    partial_response = []
                                    response += str_response


                            except StopIteration:
                                break

                    else:
                        # If the chunk is not a code block, append it to the partial response
                        partial_response.append(chunk_content)
                        if chunk_content:
                            if '\n' in chunk_content:
                                str_response = self.concatenate_partial_response(partial_response)
                                partial_response = []
                                response += str_response
            # If there is a partial response left, concatenate it and render it
            if partial_response:
                str_response = self.concatenate_partial_response(partial_response)
                response += str_response


            return response

        except Exception as e:
            print(f"An error occurred while fetching the OpenAI response: {e}")
            return "Sorry, I couldn't process that request."

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.