Streaming 2 LLM models parallely

Vedant · April 25, 2024, 5:40pm

Hi! I want to build an app where when passing a single user question I want that question to hit 2 LLM APIs and stream the output side by side, For example running gpt-3.5-turbo and gpt-4-turbo models. I want to hit both of the models ’ APIs concurrently and then stream the output from both models (maybe in 2 separate columns) in parallel.

ferdy · April 25, 2024, 8:46pm

Interesting, have you made an attempt to solve it?

ferdy · April 26, 2024, 1:04am

Sample output, left is 3.5, right is 4.

This is not using concurrency, just checking if it works.

Vedant · April 26, 2024, 4:30am

This is great So I tried to solve this using threading and the output seems to be coming out with streaming but I think threading is not the efficient way to solve this problem. The code is below. If you can help with it, it would be great. Also in the below code how can I integrate Chat history.

import streamlit as st
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from streamlit.runtime.scriptrunner import add_script_run_ctx
from streamlit.runtime.scriptrunner.script_run_context import get_script_run_ctx
import threading
import pandas as pd
from langchain_anthropic import ChatAnthropic
from dotenv import load_dotenv

load_dotenv()

st.set_page_config("LLM TEST")
MODEL_CHOICES = ['gpt-3.5-turbo', 'gpt-4-turbo']
selected_models = st.multiselect("Select models:", MODEL_CHOICES, default=MODEL_CHOICES)

def call_chain(ctx, model_name, prompt):
    add_script_run_ctx(threading.current_thread(), ctx)
    try:
        if 'opus' in model_name:  # This distinguishes Claude models
            llm = ChatAnthropic(temperature=0, model_name=model_name)
        else:
            llm = ChatOpenAI(model=model_name, temperature=0, streaming=True, max_tokens=1)

        input = ChatPromptTemplate.from_messages(
            [
                ("system", "helpful assistant"),
                ("human", "{input}")
            ]
        )
        chain = input | llm
        for i in chain.stream({'input':prompt, 'chat_history':chat_history}):
            yield i.content
    except Exception as e:
        yield f"Error: {str(e)}"
def threading_output(prompt):
    ctx = get_script_run_ctx()
    if len(selected_models) == 1:
        cols = [st]
    else:
        cols = st.columns(len(selected_models))
    threads = []

    for i, model in enumerate(selected_models):
        generator = call_chain(ctx, model, prompt)
        thread = threading.Thread(target=lambda gen=generator, col=cols[i]: give_output(gen, col), daemon=True)
        threads.append(thread)
        thread.start()

    for thread in threads:
        thread.join()

def give_output(generator, col):
    add_script_run_ctx(threading.current_thread(), get_script_run_ctx())
    col.write(generator)

user_prompt = st.chat_input("Write a question")

if user_prompt:
    threading_output(user_prompt)

Vedant · April 26, 2024, 4:32am

Can you please also provide the code for the above screenshot @ ferdy

ferdy · April 26, 2024, 8:46am

from openai import OpenAI
from streamlit import session_state as ss
import streamlit as st


st.set_page_config(layout='wide')


MODEL_OPTIONS = ['gpt-3.5-turbo-0125', 'gpt-3.5-turbo-16k-0613', 'gpt-4-turbo', 'gpt-4-0613']

if 'msg' not in ss:
    ss.msg = {'model1': [], 'model2': []}
if 'is_good_input' not in ss:
    ss.is_good_input = False


def submit_cb():
    ss.is_good_input = False
    if not ss.oaik:
        st.sidebar.warning('openai api key is missing')
    elif ss.mn1 == ss.mn2:
        st.sidebar.warning('models should not be the same')
    else:
        ss.is_good_input = True


def main():
    # Get option values.
    with st.sidebar:
        with st.form('form'):
            st.text_input('enter openai api key', type='password', key='oaik')
            st.selectbox('select model name #1', MODEL_OPTIONS, index=0, key='mn1')
            st.selectbox('select model name #2', MODEL_OPTIONS, index=2, key='mn2')
            st.slider('max tokens', value=64, min_value=16, max_value=128000, step=16, key='maxtoken')
            st.slider('temperature', value=0.5, min_value=0.0, max_value=2.0, step=0.1, key='temperature')
            st.form_submit_button('Submit', on_click=submit_cb)

    if not ss.is_good_input:
        st.stop()

    model1 = ss.mn1
    model2 = ss.mn2
    max_tokens = ss.maxtoken
    temperature = ss.temperature

    st.title(f"Chat with {model1} and {model2}")

    left, right = st.columns([1, 1], gap='large')

    with left:
        st.write(f'{model1}')

    with right:
        st.write(f'{model2}')

    client = OpenAI(api_key=ss.oaik)

    with left:
        for message in ss.msg['model1']:
            with st.chat_message(message["role"]):
                st.markdown(message["content"])

    with right:
        for message in ss.msg['model2']:
            with st.chat_message(message["role"]):
                st.markdown(message["content"])

    if prompt := st.chat_input("enter your prompt"):
        ss.msg['model1'].append({"role": "user", "content": prompt})
        ss.msg['model2'].append({"role": "user", "content": prompt})

        with left:
            with st.chat_message("user"):
                st.markdown(prompt)

            with st.chat_message("assistant"):
                stream = client.chat.completions.create(
                    model=model1,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    messages=[
                        {"role": m["role"], "content": m["content"]}
                        for m in ss.msg['model1']
                    ],
                    stream=True,
                )

                response = st.write_stream(stream)
                ss.msg['model1'].append({"role": "assistant", "content": response})

        with right:
            with st.chat_message("user"):
                st.markdown(prompt)

            with st.chat_message("assistant"):
                stream2 = client.chat.completions.create(
                    model=model2,
                    temperature=temperature,
                    max_tokens=max_tokens,
                    messages=[
                        {"role": m["role"], "content": m["content"]}
                        for m in ss.msg['model2']
                    ],
                    stream=True,
                )

                response = st.write_stream(stream2)
                ss.msg['model2'].append({"role": "assistant", "content": response})


if __name__ == '__main__':
    main()

Vedant · April 26, 2024, 8:57am

Is there a way to store the history in the database for each model respectively? Which might be more effective rather than using the session_state.

ferdy · April 26, 2024, 10:55am

If it is just run locally, it would be easy to save it in json, or duckdb or csv or sqlite3, etc. If this is deployed in streamlit community cloud, you can store it in deta space’s detabase or even in deta space’s deta drive as a csv or json or sqlite3, etc file or even in google sheet. There can be other free hosting to save it.

If you deployed this in the cloud and used by other users, be sure to ask permission, etc that you are saving the data or messages in a database, its purpose, etc.

Vedant · April 26, 2024, 11:48am

Thanks for answering the question I want to know this final thing, As I will be deploying it on a server and currently I am using threads to do my work. Will it be efficient to use threads or if there is something else which will do my work can you please show.

The current output with threads looks like this and is quite nice but will it be efficient when it is deployed.

ferdy · April 26, 2024, 1:02pm

Create two versions, one with separate threads and without. Compare its performances thru observation on interaction and try to compare the memory usage too. Community cloud has memory usage limit.

ferdy · April 26, 2024, 1:59pm

To guarantee horizontal alignment, wrap the messages in a container, you can define a height.

Example:

with left:
    with st.chat_message("user"):
        with st.container(border=True, height=150):
            st.markdown(prompt)

Vedant · April 27, 2024, 7:21am

Thanks for the help, I will let you know if this will work after deployment

system · April 29, 2024, 7:21am

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Running two LLMs in parallel in streamlit Using Streamlit discussion	1	726	September 7, 2024
Sigma AI -- a multi-model LLM chat Streamlit App Show the Community!	1	467	December 5, 2024
Streamlit Ollama Chatbot Multi-Model Interface 🤖 Show the Community! llms , chatbot	0	1915	November 29, 2024
Streamlit Chatbot: Token Streaming LLMs and AI	1	3138	December 19, 2023
Display LLM response stream from OpenAI Assistant API Using Streamlit llms , discussion	4	1965	October 13, 2024

Streaming 2 LLM models parallely

Related topics

Hello there 👋🏻

Cookie settings

Strictly necessary cookies

Performance cookies

Functional cookies

Targeting cookies