1. What is Arctic?
Arctic is a family of enterprise-grade open-source LLM foundation and embed models built by Snowflake that was released on April 25, 2024. For a deep dive on Arctic, check out the blog from Snowflake AI Research.
2. Using Arctic
Arctic is available in 2 variants on the HuggingFace platform:
-
Snowflake/snowflake-arctic-embed
Arctic Embed is best used for retrieval operations, such as RAG, and outputs similarity scores. The smallest Embed models (xs and s) are able to run on your laptop, and you can use regular GPUs forsnowflake-arctic-embed-l
and up. -
Snowflake/snowflake-arctic-instruct
Arctic Instruct is a 480B parameter foundation LLM. Owing to the rather large model size, the Instruct model series require the use of specialized and dedicated hardware, such as H100 GPUs. To access Arctic Instruct, please see this Guide for Snowflake Cortex or Section #3 below for how to access Arctic Instruct via Replicate.
3. Access Arctic Instruct via the Replicate API
You can access Arctic Instruct via robust cloud servers such as the one hosted on Replicate or Snowflake Cortex. The advantage of this is that there are no servers to set up and maintain, you don’t need your own dedicated (and expensive hardware), and you only pay for what you use.
Arctic is available for free for a limited time on Snowflake Cortex (until end of May), and if you register for the Arctic and Streamlit Hackathon you can also get a $50 credit to use the Replicate API for your project.
The below guide will focus on Replicate. See this post for a Snowflake Cortex quickstart instead.
3.0. Register for Arctic Streamlit Hackathon
Until May 21st, 2024: Register for the Arctic and Streamlit Hackathon to get a $50 credit to use the Replicate API for your project.
Apply for your credit here and remember to register for the Hackathon first!
3.1. Get API token
To start using Arctic with Replicate, you’ll need to get your own Replicate API token, which is a simple 3-step process:
- Go to https://replicate.com/signin/.
- Sign in with your GitHub account.
- Proceed to the API tokens page and copy your API token.
3.2. Install Replicate
You can install the Replicate Python library in the command-line as follows:
pip install replicate
3.3. Set API token
Next, set the REPLICATE_API_TOKEN
environment variable:
export REPLICATE_API_TOKEN=<paste-your-token-here>
3.4. Model inference
Import the replicate
library:
import replicate
Finally, run the model inference via Replicate’s API:
for event in replicate.stream(
"snowflake/snowflake-arctic-instruct",
input={
"top_k": 50,
"top_p": 0.9,
"prompt": "Generate a poem about the Python programming language.",
"temperature": 0.2,
"max_new_tokens": 512,
"min_new_tokens": 0,
"stop_sequences": "<|im_end|>",
"prompt_template": "<|im_start|>system\nYou're a helpful assistant<|im_end|>\n<|im_start|>user\n{prompt}<|im_end|>\n\n<|im_start|>assistant\n",
"presence_penalty": 1.15,
"frequency_penalty": 0.2
},
):
print(str(event), end="")
4. Use Arctic in a Streamlit app
Feel free to use the Arctic chatbot app template to get started in using Arctic in a Streamlit app.
4.1. Local development
To set up a local coding environment, enter the following into a command line prompt:
pip install streamlit replicate
4.2. Cloud development
You can easily set up a cloud environment by deploying to the Streamlit Community Cloud with the help of the Streamlit app template (read more here).
Add a requirements.txt
file to your GitHub repo and include the following prerequisite libraries:
streamlit
replicate
transformers
4.3. Build the app
The Arctic chatbot app can be written in 95 lines of code:
import streamlit as st
import replicate
import os
from transformers import AutoTokenizer
# Set assistant icon to Snowflake logo
icons = {"assistant": "./Snowflake_Logomark_blue.svg", "user": "⛷️"}
# App title
st.set_page_config(page_title="Snowflake Arctic")
# Replicate Credentials
with st.sidebar:
st.title('Snowflake Arctic')
if 'REPLICATE_API_TOKEN' in st.secrets:
#st.success('API token loaded!', icon='âś…')
replicate_api = st.secrets['REPLICATE_API_TOKEN']
else:
replicate_api = st.text_input('Enter Replicate API token:', type='password')
if not (replicate_api.startswith('r8_') and len(replicate_api)==40):
st.warning('Please enter your Replicate API token.', icon='⚠️')
st.markdown("**Don't have an API token?** Head over to [Replicate](https://replicate.com) to sign up for one.")
#else:
# st.success('API token loaded!', icon='âś…')
os.environ['REPLICATE_API_TOKEN'] = replicate_api
st.subheader("Adjust model parameters")
temperature = st.sidebar.slider('temperature', min_value=0.01, max_value=5.0, value=0.3, step=0.01)
top_p = st.sidebar.slider('top_p', min_value=0.01, max_value=1.0, value=0.9, step=0.01)
# Store LLM-generated responses
if "messages" not in st.session_state.keys():
st.session_state.messages = [{"role": "assistant", "content": "Hi. I'm Arctic, a new, efficient, intelligent, and truly open language model created by Snowflake AI Research. Ask me anything."}]
# Display or clear chat messages
for message in st.session_state.messages:
with st.chat_message(message["role"], avatar=icons[message["role"]]):
st.write(message["content"])
def clear_chat_history():
st.session_state.messages = [{"role": "assistant", "content": "Hi. I'm Arctic, a new, efficient, intelligent, and truly open language model created by Snowflake AI Research. Ask me anything."}]
st.sidebar.button('Clear chat history', on_click=clear_chat_history)
st.sidebar.caption('Built by [Snowflake](https://snowflake.com/) to demonstrate [Snowflake Arctic](https://www.snowflake.com/blog/arctic-open-and-efficient-foundation-language-models-snowflake). App hosted on [Streamlit Community Cloud](https://streamlit.io/cloud). Model hosted by [Replicate](https://replicate.com/snowflake/snowflake-arctic-instruct).')
@st.cache_resource(show_spinner=False)
def get_tokenizer():
"""Get a tokenizer to make sure we're not sending too much text
text to the Model. Eventually we will replace this with ArcticTokenizer
"""
return AutoTokenizer.from_pretrained("huggyllama/llama-7b")
def get_num_tokens(prompt):
"""Get the number of tokens in a given prompt"""
tokenizer = get_tokenizer()
tokens = tokenizer.tokenize(prompt)
return len(tokens)
# Function for generating Snowflake Arctic response
def generate_arctic_response():
prompt = []
for dict_message in st.session_state.messages:
if dict_message["role"] == "user":
prompt.append("<|im_start|>user\n" + dict_message["content"] + "<|im_end|>")
else:
prompt.append("<|im_start|>assistant\n" + dict_message["content"] + "<|im_end|>")
prompt.append("<|im_start|>assistant")
prompt.append("")
prompt_str = "\n".join(prompt)
if get_num_tokens(prompt_str) >= 3072:
st.error("Conversation length too long. Please keep it under 3072 tokens.")
st.button('Clear chat history', on_click=clear_chat_history, key="clear_chat_history")
st.stop()
for event in replicate.stream("snowflake/snowflake-arctic-instruct",
input={"prompt": prompt_str,
"prompt_template": r"{prompt}",
"temperature": temperature,
"top_p": top_p,
}):
yield str(event)
# User-provided prompt
if prompt := st.chat_input(disabled=not replicate_api):
st.session_state.messages.append({"role": "user", "content": prompt})
with st.chat_message("user", avatar="⛷️"):
st.write(prompt)
# Generate a new response if last message is not from assistant
if st.session_state.messages[-1]["role"] != "assistant":
with st.chat_message("assistant", avatar="./Snowflake_Logomark_blue.svg"):
response = generate_arctic_response()
full_response = st.write_stream(response)
message = {"role": "assistant", "content": full_response}
st.session_state.messages.append(message)
5. Deploy the app
Host your app for free on Streamlit Community Cloud. These instructions are also available in our docs.
- Sign up for a Community Cloud account or log in at share.streamlit.io.
- Click “New app” from the upper-right corner of your workspace.
- Fill in your repo, branch, and file path. As a shortcut, you can also click “Paste GitHub URL” to paste a link directly to
streamlit_app.py
on GitHub.
6. Store your Replicate API token with Community Cloud secrets
Securely store your Replicate API token with Community Cloud’s secrets management feature. These instructions are also available in our docs.
6.1. Add secrets before deploying
- Before clicking “Deploy”, click “Advanced settings…”
- A modal will appear with an input box for your secrets.
- Provide your secrets in the “Secrets” field using TOML format. For example:
REPLICATE_API_TOKEN = "your API token here"
6.2. Add secrets after deploying
- Go to share.streamlit.io.
- Click the overflow menu icon (AKA hamburger icon) for your app.
- Click “Settings”.
- A modal will appear. Click “Secrets” on the left.
- After you edit your secrets, click “Save”. It might take a minute for the update to be propagated to your app, but the new values will be reflected when the app re-runs.
7. Congratulations!
Your Arctic chatbot app should look something like the following: