How to use Streaming response to a container WITHOUT using LangChain (AWS SageMaker Endpoint)

Purity · December 28, 2023, 7:34pm

Hi,

i have a problem with my RAG application i built with Streamlit. I started with LangChain, however i’m currently trying to build the application entirely without it.
My LLM is hosted as a AWS SageMaker Endpoint. In Python i use the boto3 client to invoke the endpoint, however the TokenIterator doesn’t return anything when used within a streamlit application:

def call_llm(prompt, container):
    response = boto3_client.invoke_endpoint_with_response_stream(
            Arguments... (No errors here)
            )
    print(response) # Shows that i get a valid EventStream
    current_completion = ""
    for token in TokenIterator(response["Body"]):
        current_completion += token
        print(token) # Nothing happens here
        container.markdown(current_completion) # Nothing happens here either

The corresponding TokenIterator looks like this:

import io
import json

class TokenIterator:
    def __init__(self, stream):
        self.byte_iterator = iter(stream)
        self.buffer = io.BytesIO()
        self.read_pos = 0

    def __iter__(self):
        return self

    def __next__(self):
        while True:
            self.buffer.seek(self.read_pos)
            line = self.buffer.readline()
            if line and line[-1] == ord("\n"):
                self.read_pos += len(line) + 1
                full_line = line[:-1].decode("utf-8")
                line_data = json.loads(full_line.lstrip("data:").rstrip("/n"))
                return line_data["token"]["text"]
            chunk = next(self.byte_iterator)
            self.buffer.seek(0, io.SEEK_END)
            self.buffer.write(chunk["PayloadPart"]["Bytes"])

This approach works flawlessly in a pure python script, but not in Streamlit. Using LangChain this feature can be used with a custom StreamHandler, that gets a container passed on to write to. (As seen in this topic: Langchain stream)
However since i don’t want to use LangChain i need another solution. Can someone please help me out on this problem? It seems like the Callbacks from LangChain do something different, but i don’t understand what makes them work, that doesn’t work in my own script? Especially since it seems like the implementation of the TokenIterator just doesn’t work within the Streamlit app.

App is currently used locally, streamlit version 1.28.0, python 3.11
I’d be veery thankful for some help

ferdy · December 31, 2023, 11:37am

Hello and welcome to the Streamlit family! We’re so glad you’re here. As you get started, do check out our thread Using Streamlit: How to Post a Question Effectively. It’s packed with tips and tricks for framing your questions in a way that’s both clear and engaging, helping you tap into the collective wisdom of our supportive and experienced community members.

Nikhil_Talreja · April 19, 2024, 3:15pm

So I was able to make your example work.

The only thing missing was passing "stream":True as an argument to the endpoint

body = {"inputs": "what is life. Explain in 100 words", "parameters": {"max_new_tokens": 1000}, "stream": True}

resp = boto3_client.invoke_endpoint_with_response_stream(EndpointName=endpoint_name, Body=json.dumps(body),
                                                ContentType="application/json")
event_stream = resp['Body']

current_completion = ""
for line in LineIterator(event_stream):
    current_completion += line
    print(line, end="")

system · October 16, 2024, 3:16pm

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Streamlit and LangChain Async Using Streamlit	1	3119	April 14, 2024
Streamlit Chatbot: Token Streaming LLMs and AI	1	2859	December 19, 2023
Wanted to Stream LLM response as it arrives to the streamlit application Using Streamlit discussion	3	145	October 23, 2024
Langchain stream Show the Community! llms	11	16341	August 28, 2024
LangChain 🤝 Streamlit Show the Community! llms	1	1926	March 9, 2024

How to use Streaming response to a container WITHOUT using LangChain (AWS SageMaker Endpoint)

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies