Streaming 2 LLM models parallely

Hi! I want to build an app where when passing a single user question I want that question to hit 2 LLM APIs and stream the output side by side, For example running gpt-3.5-turbo and gpt-4-turbo models. I want to hit both of the models ’ APIs concurrently and then stream the output from both models (maybe in 2 separate columns) in parallel.

1 Like

Interesting, have you made an attempt to solve it?

Sample output, left is 3.5, right is 4.

This is not using concurrency, just checking if it works.

This is great So I tried to solve this using threading and the output seems to be coming out with streaming but I think threading is not the efficient way to solve this problem. The code is below. If you can help with it, it would be great. Also in the below code how can I integrate Chat history.

import streamlit as st
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from streamlit.runtime.scriptrunner import add_script_run_ctx
from streamlit.runtime.scriptrunner.script_run_context import get_script_run_ctx
import threading
import pandas as pd
from langchain_anthropic import ChatAnthropic
from dotenv import load_dotenv

load_dotenv()

st.set_page_config("LLM TEST")
MODEL_CHOICES = ['gpt-3.5-turbo', 'gpt-4-turbo']
selected_models = st.multiselect("Select models:", MODEL_CHOICES, default=MODEL_CHOICES)

def call_chain(ctx, model_name, prompt):
    add_script_run_ctx(threading.current_thread(), ctx)
    try:
        if 'opus' in model_name:  # This distinguishes Claude models
            llm = ChatAnthropic(temperature=0, model_name=model_name)
        else:
            llm = ChatOpenAI(model=model_name, temperature=0, streaming=True, max_tokens=1)

        input = ChatPromptTemplate.from_messages(
            [
                ("system", "helpful assistant"),
                ("human", "{input}")
            ]
        )
        chain = input | llm
        for i in chain.stream({'input':prompt, 'chat_history':chat_history}):
            yield i.content
    except Exception as e:
        yield f"Error: {str(e)}"
def threading_output(prompt):
    ctx = get_script_run_ctx()
    if len(selected_models) == 1:
        cols = [st]
    else:
        cols = st.columns(len(selected_models))
    threads = []

    for i, model in enumerate(selected_models):
        generator = call_chain(ctx, model, prompt)
        thread = threading.Thread(target=lambda gen=generator, col=cols[i]: give_output(gen, col), daemon=True)
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

def give_output(generator, col):
    add_script_run_ctx(threading.current_thread(), get_script_run_ctx())
    col.write(generator)

user_prompt = st.chat_input("Write a question")

if user_prompt:
    threading_output(user_prompt)


Can you please also provide the code for the above screenshot @ ferdy

from openai import OpenAI
from streamlit import session_state as ss
import streamlit as st


st.set_page_config(layout='wide')


MODEL_OPTIONS = ['gpt-3.5-turbo-0125', 'gpt-3.5-turbo-16k-0613', 'gpt-4-turbo', 'gpt-4-0613']

if 'msg' not in ss:
    ss.msg = {'model1': [], 'model2': []}
if 'is_good_input' not in ss:
    ss.is_good_input = False


def submit_cb():
    ss.is_good_input = False
    if not ss.oaik:
        st.sidebar.warning('openai api key is missing')
    elif ss.mn1 == ss.mn2:
        st.sidebar.warning('models should not be the same')
    else:
        ss.is_good_input = True


def main():
    # Get option values.
    with st.sidebar:
        with st.form('form'):
            st.text_input('enter openai api key', type='password', key='oaik')
            st.selectbox('select model name #1', MODEL_OPTIONS, index=0, key='mn1')
            st.selectbox('select model name #2', MODEL_OPTIONS, index=2, key='mn2')
            st.slider('max tokens', value=64, min_value=16, max_value=128000, step=16, key='maxtoken')
            st.slider('temperature', value=0.5, min_value=0.0, max_value=2.0, step=0.1, key='temperature')
            st.form_submit_button('Submit', on_click=submit_cb)

    if not ss.is_good_input:
        st.stop()

    model1 = ss.mn1
    model2 = ss.mn2
    max_tokens = ss.maxtoken
    temperature = ss.temperature

    st.title(f"Chat with {model1} and {model2}")

    left, right = st.columns([1, 1], gap='large')

    with left:
        st.write(f'{model1}')

    with right:
        st.write(f'{model2}')

    client = OpenAI(api_key=ss.oaik)

    with left:
        for message in ss.msg['model1']:
            with st.chat_message(message["role"]):
                st.markdown(message["content"])

    with right:
        for message in ss.msg['model2']:
            with st.chat_message(message["role"]):
                st.markdown(message["content"])

    if prompt := st.chat_input("enter your prompt"):
        ss.msg['model1'].append({"role": "user", "content": prompt})
        ss.msg['model2'].append({"role": "user", "content": prompt})

        with left:
            with st.chat_message("user"):
                st.markdown(prompt)

            with st.chat_message("assistant"):
                stream = client.chat.completions.create(
                    model=model1,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    messages=[
                        {"role": m["role"], "content": m["content"]}
                        for m in ss.msg['model1']
                    ],
                    stream=True,
                )

                response = st.write_stream(stream)
                ss.msg['model1'].append({"role": "assistant", "content": response})

        with right:
            with st.chat_message("user"):
                st.markdown(prompt)

            with st.chat_message("assistant"):
                stream2 = client.chat.completions.create(
                    model=model2,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    messages=[
                        {"role": m["role"], "content": m["content"]}
                        for m in ss.msg['model2']
                    ],
                    stream=True,
                )

                response = st.write_stream(stream2)
                ss.msg['model2'].append({"role": "assistant", "content": response})


if __name__ == '__main__':
    main()
2 Likes

Is there a way to store the history in the database for each model respectively? Which might be more effective rather than using the session_state.

If it is just run locally, it would be easy to save it in json, or duckdb or csv or sqlite3, etc. If this is deployed in streamlit community cloud, you can store it in deta space’s detabase or even in deta space’s deta drive as a csv or json or sqlite3, etc file or even in google sheet. There can be other free hosting to save it.

If you deployed this in the cloud and used by other users, be sure to ask permission, etc that you are saving the data or messages in a database, its purpose, etc.

1 Like

Thanks for answering the question I want to know this final thing, As I will be deploying it on a server and currently I am using threads to do my work. Will it be efficient to use threads or if there is something else which will do my work can you please show.

The current output with threads looks like this and is quite nice but will it be efficient when it is deployed.

Create two versions, one with separate threads and without. Compare its performances thru observation on interaction and try to compare the memory usage too. Community cloud has memory usage limit.

To guarantee horizontal alignment, wrap the messages in a container, you can define a height.

Example:

with left:
    with st.chat_message("user"):
        with st.container(border=True, height=150):
            st.markdown(prompt)

Thanks for the help, I will let you know if this will work after deployment

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.