FAQ: How to build an Arctic chatbot

1. What is Arctic?

Arctic is a family of enterprise-grade open-source LLM foundation and embed models built by Snowflake that was released on April 25, 2024. For a deep dive on Arctic, check out the blog from Snowflake AI Research.

2. Using Arctic

Arctic is available in 2 variants on the HuggingFace platform:

  1. Snowflake/snowflake-arctic-embed

    Arctic Embed is best used for retrieval operations, such as RAG, and outputs similarity scores. The smallest Embed models (xs and s) are able to run on your laptop, and you can use regular GPUs for snowflake-arctic-embed-l and up.

  2. Snowflake/snowflake-arctic-instruct

    Arctic Instruct is a 480B parameter foundation LLM. Owing to the rather large model size, the Instruct model series require the use of specialized and dedicated hardware, such as H100 GPUs. To access Arctic Instruct, please see this Guide for Snowflake Cortex or Section #3 below for how to access Arctic Instruct via Replicate.

3. Access Arctic Instruct via the Replicate API

You can access Arctic Instruct via robust cloud servers such as the one hosted on Replicate or Snowflake Cortex. The advantage of this is that there are no servers to set up and maintain, you don’t need your own dedicated (and expensive hardware), and you only pay for what you use.

Arctic is available for free for a limited time on Snowflake Cortex (until end of May), and if you register for the Arctic and Streamlit Hackathon you can also get a $50 credit to use the Replicate API for your project.

The below guide will focus on Replicate. See this post for a Snowflake Cortex quickstart instead.

3.0. Register for Arctic Streamlit Hackathon

Until May 21st, 2024: Register for the Arctic and Streamlit Hackathon to get a $50 credit to use the Replicate API for your project.

Apply for your credit here and remember to register for the Hackathon first!

3.1. Get API token

To start using Arctic with Replicate, you’ll need to get your own Replicate API token, which is a simple 3-step process:

  1. Go to https://replicate.com/signin/.
  2. Sign in with your GitHub account.
  3. Proceed to the API tokens page and copy your API token.

3.2. Install Replicate

You can install the Replicate Python library in the command-line as follows:

pip install replicate

3.3. Set API token

Next, set the REPLICATE_API_TOKEN environment variable:

export REPLICATE_API_TOKEN=<paste-your-token-here>

3.4. Model inference

Import the replicate library:

import replicate

Finally, run the model inference via Replicate’s API:

for event in replicate.stream(
    "snowflake/snowflake-arctic-instruct",
    input={
        "top_k": 50,
        "top_p": 0.9,
        "prompt": "Generate a poem about the Python programming language.",
        "temperature": 0.2,
        "max_new_tokens": 512,
        "min_new_tokens": 0,
        "stop_sequences": "<|im_end|>",
        "prompt_template": "<|im_start|>system\nYou're a helpful assistant<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n\n<|im_start|>assistant\n",
        "presence_penalty": 1.15,
        "frequency_penalty": 0.2
    },
):
    print(str(event), end="")

4. Use Arctic in a Streamlit app

Feel free to use the Arctic chatbot app template to get started in using Arctic in a Streamlit app.

4.1. Local development

To set up a local coding environment, enter the following into a command line prompt:

pip install streamlit replicate

4.2. Cloud development

You can easily set up a cloud environment by deploying to the Streamlit Community Cloud with the help of the Streamlit app template (read more here).

Add a requirements.txt file to your GitHub repo and include the following prerequisite libraries:

streamlit
replicate
transformers

4.3. Build the app

The Arctic chatbot app can be written in 95 lines of code:

import streamlit as st
import replicate
import os
from transformers import AutoTokenizer

# Set assistant icon to Snowflake logo
icons = {"assistant": "./Snowflake_Logomark_blue.svg", "user": "⛷️"}

# App title
st.set_page_config(page_title="Snowflake Arctic")

# Replicate Credentials
with st.sidebar:
    st.title('Snowflake Arctic')
    if 'REPLICATE_API_TOKEN' in st.secrets:
        #st.success('API token loaded!', icon='âś…')
        replicate_api = st.secrets['REPLICATE_API_TOKEN']
    else:
        replicate_api = st.text_input('Enter Replicate API token:', type='password')
        if not (replicate_api.startswith('r8_') and len(replicate_api)==40):
            st.warning('Please enter your Replicate API token.', icon='⚠️')
            st.markdown("**Don't have an API token?** Head over to [Replicate](https://replicate.com) to sign up for one.")
        #else:
        #    st.success('API token loaded!', icon='âś…')

    os.environ['REPLICATE_API_TOKEN'] = replicate_api
    st.subheader("Adjust model parameters")
    temperature = st.sidebar.slider('temperature', min_value=0.01, max_value=5.0, value=0.3, step=0.01)
    top_p = st.sidebar.slider('top_p', min_value=0.01, max_value=1.0, value=0.9, step=0.01)

# Store LLM-generated responses
if "messages" not in st.session_state.keys():
    st.session_state.messages = [{"role": "assistant", "content": "Hi. I'm Arctic, a new, efficient, intelligent, and truly open language model created by Snowflake AI Research. Ask me anything."}]

# Display or clear chat messages
for message in st.session_state.messages:
    with st.chat_message(message["role"], avatar=icons[message["role"]]):
        st.write(message["content"])

def clear_chat_history():
    st.session_state.messages = [{"role": "assistant", "content": "Hi. I'm Arctic, a new, efficient, intelligent, and truly open language model created by Snowflake AI Research. Ask me anything."}]
st.sidebar.button('Clear chat history', on_click=clear_chat_history)

st.sidebar.caption('Built by [Snowflake](https://snowflake.com/) to demonstrate [Snowflake Arctic](https://www.snowflake.com/blog/arctic-open-and-efficient-foundation-language-models-snowflake). App hosted on [Streamlit Community Cloud](https://streamlit.io/cloud). Model hosted by [Replicate](https://replicate.com/snowflake/snowflake-arctic-instruct).')

@st.cache_resource(show_spinner=False)
def get_tokenizer():
    """Get a tokenizer to make sure we're not sending too much text
    text to the Model. Eventually we will replace this with ArcticTokenizer
    """
    return AutoTokenizer.from_pretrained("huggyllama/llama-7b")

def get_num_tokens(prompt):
    """Get the number of tokens in a given prompt"""
    tokenizer = get_tokenizer()
    tokens = tokenizer.tokenize(prompt)
    return len(tokens)

# Function for generating Snowflake Arctic response
def generate_arctic_response():
    prompt = []
    for dict_message in st.session_state.messages:
        if dict_message["role"] == "user":
            prompt.append("<|im_start|>user\n" + dict_message["content"] + "<|im_end|>")
        else:
            prompt.append("<|im_start|>assistant\n" + dict_message["content"] + "<|im_end|>")
    
    prompt.append("<|im_start|>assistant")
    prompt.append("")
    prompt_str = "\n".join(prompt)
    
    if get_num_tokens(prompt_str) >= 3072:
        st.error("Conversation length too long. Please keep it under 3072 tokens.")
        st.button('Clear chat history', on_click=clear_chat_history, key="clear_chat_history")
        st.stop()

    for event in replicate.stream("snowflake/snowflake-arctic-instruct",
                           input={"prompt": prompt_str,
                                  "prompt_template": r"{prompt}",
                                  "temperature": temperature,
                                  "top_p": top_p,
                                  }):
        yield str(event)

# User-provided prompt
if prompt := st.chat_input(disabled=not replicate_api):
    st.session_state.messages.append({"role": "user", "content": prompt})
    with st.chat_message("user", avatar="⛷️"):
        st.write(prompt)

# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
    with st.chat_message("assistant", avatar="./Snowflake_Logomark_blue.svg"):
        response = generate_arctic_response()
        full_response = st.write_stream(response)
    message = {"role": "assistant", "content": full_response}
    st.session_state.messages.append(message)

5. Deploy the app

Host your app for free on Streamlit Community Cloud. These instructions are also available in our docs.

  1. Sign up for a Community Cloud account or log in at share.streamlit.io.
  2. Click “New app” from the upper-right corner of your workspace.
  3. Fill in your repo, branch, and file path. As a shortcut, you can also click “Paste GitHub URL” to paste a link directly to streamlit_app.py on GitHub.

6. Store your Replicate API token with Community Cloud secrets

Securely store your Replicate API token with Community Cloud’s secrets management feature. These instructions are also available in our docs.

6.1. Add secrets before deploying

  1. Before clicking “Deploy”, click “Advanced settings…”
  2. A modal will appear with an input box for your secrets.
  3. Provide your secrets in the “Secrets” field using TOML format. For example:
REPLICATE_API_TOKEN = "your API token here"

6.2. Add secrets after deploying

  1. Go to share.streamlit.io.
  2. Click the overflow menu icon (AKA hamburger icon) for your app.
  3. Click “Settings”.
  4. A modal will appear. Click “Secrets” on the left.
  5. After you edit your secrets, click “Save”. It might take a minute for the update to be propagated to your app, but the new values will be reflected when the app re-runs.

7. Congratulations!

Your Arctic chatbot app should look something like the following:
ScreenRecording2567-04-29at20.49.59-ezgif.com-video-to-gif-converter

Links and Resources

1 Like