Caching stream response from LLM

How do you cache a response that is streamed from an LLM, and then displayed using st.write_stream?

The response from the LLM that is used in st.write_stream is an object, not data, so st.cache_data does not work.

Similarly, st.cache_resource doesn’t seem to work. On re-run, the cached response is blank, which I believe is a result of the last streamed object being blank.

Code to reproduce:

`import openai
import streamlit as st

@st.cache_resource()
def get_response_stream(system_prompt, prompt):
    openai.api_key = ""
    openai_model = "gpt-4o"
    temperature = 0

    messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}]
    response = openai.chat.completions.create(model=openai_model,messages=messages,temperature=temperature,stream=True)
    return response

system_prompt = "You are a helpful assistant"
prompt = "Write me a short story"

response = utilities.get_response_stream(system_prompt,prompt)
st.write_stream(response)`

Hey @PeterM , yeah I assume it’s empty because the cached stream is exhausted in the end.
You could try to wrap the stream and cache the result manually, like (untested):

import openai
import streamlit as st

if "cached_stream" not in st.session.state:
  st.session_state.cached_stream: Dict[str, list] = {}

def cache_stream_generator(key, stream):
  chunks  = []
  for chunk in stream:
    chunks.append(chunk)
    yield chunk
  st.session_state.cached_stream[key] = chunks

def get_response_stream(system_prompt, prompt):
  key = f"{system_prompt},{prompt}"
  if key in st.session_state.cached_stream:
    return st.session_state.cached_stream[key]
  
  openai.api_key = ""
  openai_model = "gpt-4o"
  temperature = 0

  messages = [{"role": "system", "content": system_prompt}, {"role": "user", "content": prompt}]
  response = openai.chat.completions.create(model=openai_model,messages=messages,temperature=temperature,stream=True)

  return cache_stream_generator(key, response)


system_prompt = "You are a helpful assistant"
prompt = "Write me a short story"

response = utilities.get_response_stream(system_prompt,prompt)
st.write_stream(response)`

Thanks! I thought this might be the way to go but was hoping for something more elegant. I’ll give it a shot!

1 Like

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.