Anyone using the new features from openai?

Excited about the new token limits and things like assistants. Previously I was working on an app as a german tutor, but running a second column with translations of what the teacher was saying. These required a second call to openai.

Now with the assistants and persistent threads it looks like the API can keep track of a lot of what I was managing in Python.

Working with files too looks interesting… generate some visualization and then send it for openai to describe.

Still trying to wrap m head about the function calling and code interpretation!

A couple days later…

Ugh, a few config things to change but finally got the streaming to work again with the new gpt4 turbo model and updated openai library. Next step is to try and integrate the assistant instead of the chat completion so that I can access more of the new features.

Here are the changes I had to make to use the updated openai since 1.2.2 library (now 1.6.1).
After “import openai” add

from openai import OpenAI
openai.api_key = st.secrets['openai']["OPENAI_API_KEY"] >> OpenAI.api_key = st.secrets['openai']["OPENAI_API_KEY"]

I mean set it to whatever location you want… environment variable, secrets… but change the variable to “OpenAI.api_key”

client = OpenAI(api_key=OpenAI.api_key)

to make the request
for response in openai.ChatCompletion.create( model="gpt-4",... >> for response in"gpt-4-1106-preview",...

For streaming
full_response += response.choices[0].delta.get("content", "") >> full_response += response.choices[0].delta.content

In fact, lately I have been getting TypeError: can only concatenate str (not "NoneType") to str so to get around this I use the following
if response.choices[0].delta.content: full_response += response.choices[0].delta.content

For single response
I think it is the same syntax as before

And if you are checking for errors
except openai.error.RateLimitError as error: >> except openai.RateLimitError as error:


With the latest openai updates, you can now get streaming token counts at the end of the stream.

When you enable usage tracking in streaming, your last response includes the token count. For example, the following content shows the last two responses from a streaming with the usage enabled.

choices=[Choice(delta=ChoiceDelta(content=None, function_call=None, 
role=None, tool_calls=None), finish_reason='stop', index=0, 
logprobs=None)], created=1717451397, model='gpt-4o-2024-05-13', 
object='chat.completion.chunk', system_fingerprint='fp_319be4768e', 
choices=[], created=1717451397, model='gpt-4o-2024-05-13', 
object='chat.completion.chunk', system_fingerprint='fp_319be4768e', 
usage=CompletionUsage(completion_tokens=100, prompt_tokens=641, 

To enable it add a stream_options parameter to the chat.completions.create (not sure this would work if you are using the assistant call)

stream_options={"include_usage": True},

This changes the previous fix I mentioned above to watch for an empty stream result… so I added another if nesting to identify the end of the stream and capture the usage results.

if "ChoiceDelta" in str(response):
  if response.choices[0].delta.content:
    full_response += response.choices[0].delta.content
if response.usage: 
  completion_tokens = response.usage.completion_tokens
  prompt_tokens = response.usage.prompt_tokens
message_placeholder.write(full_response + "▌")

There is a max_retries and timeout option that I am looking forward to try. GitHub - openai/openai-python: The official Python library for the OpenAI API

Also, if you have not visited your openAI dev account recently, you can now make projects to organize your API keys. This is great because you can configure limits and budgets for keys in a project, give names that prepend the keys with test / prod, and select which models an API key can use. For me this helps organize costs better.

OpenAI changelogs

1 Like