How to use Streaming response to a container WITHOUT using LangChain (AWS SageMaker Endpoint)

Hi,

i have a problem with my RAG application i built with Streamlit. I started with LangChain, however i’m currently trying to build the application entirely without it.
My LLM is hosted as a AWS SageMaker Endpoint. In Python i use the boto3 client to invoke the endpoint, however the TokenIterator doesn’t return anything when used within a streamlit application:

def call_llm(prompt, container):
    response = boto3_client.invoke_endpoint_with_response_stream(
            Arguments... (No errors here)
            )
    print(response) # Shows that i get a valid EventStream
    current_completion = ""
    for token in TokenIterator(response["Body"]):
        current_completion += token
        print(token) # Nothing happens here
        container.markdown(current_completion) # Nothing happens here either

The corresponding TokenIterator looks like this:

import io
import json

class TokenIterator:
    def __init__(self, stream):
        self.byte_iterator = iter(stream)
        self.buffer = io.BytesIO()
        self.read_pos = 0

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            self.buffer.seek(self.read_pos)
            line = self.buffer.readline()
            if line and line[-1] == ord("\n"):
                self.read_pos += len(line) + 1
                full_line = line[:-1].decode("utf-8")
                line_data = json.loads(full_line.lstrip("data:").rstrip("/n"))
                return line_data["token"]["text"]
            chunk = next(self.byte_iterator)
            self.buffer.seek(0, io.SEEK_END)
            self.buffer.write(chunk["PayloadPart"]["Bytes"])

This approach works flawlessly in a pure python script, but not in Streamlit. Using LangChain this feature can be used with a custom StreamHandler, that gets a container passed on to write to. (As seen in this topic: Langchain stream)
However since i don’t want to use LangChain i need another solution. Can someone please help me out on this problem? It seems like the Callbacks from LangChain do something different, but i don’t understand what makes them work, that doesn’t work in my own script? Especially since it seems like the implementation of the TokenIterator just doesn’t work within the Streamlit app.

App is currently used locally, streamlit version 1.28.0, python 3.11
I’d be veery thankful for some help :slight_smile:

Hello and welcome to the Streamlit family! We’re so glad you’re here. As you get started, do check out our thread Using Streamlit: How to Post a Question Effectively. It’s packed with tips and tricks for framing your questions in a way that’s both clear and engaging, helping you tap into the collective wisdom of our supportive and experienced community members.

So I was able to make your example work.

The only thing missing was passing "stream":True as an argument to the endpoint

body = {"inputs": "what is life. Explain in 100 words", "parameters": {"max_new_tokens": 1000}, "stream": True}

resp = boto3_client.invoke_endpoint_with_response_stream(EndpointName=endpoint_name, Body=json.dumps(body),
                                                ContentType="application/json")
event_stream = resp['Body']

current_completion = ""
for line in LineIterator(event_stream):
    current_completion += line
    print(line, end="")