GPT-4-omni-mini is incredible

I’ve been doing a lot of testing with the new GPT-4-omni-mini model, and I’m blown away. It’s hard to believe it’s cheaper than either GPT 3.5 Turbo or Claude 3 Haiku.

I updated all my toy Streamlit apps to use it, and they are all the better for it. Emily is capable of having longer conversations and offering a more in-depth tarot session and just feels a lot “smarter” (https://emilytarot.com) and the stories generated at https://littlecattales.com definitely feel like they keep to the intended plot better, and it is certainly better and producing the stories without error in general. I can’t notice any difference on just a few attempts at https://thetroublewithbridges.com but it’s faster and cheaper at least.

I’ve been doing lots of testing of it within the context of an QA bot that must return reliable, accurate data as well, and it is performing really well in my particular conditions. Just a great model.

Anyone else tried it out yet?

2 Likes

I totally agree - for my use case it is just as fast and accurate as 4o. The most impressive part is the openai api usage chart, my costs are 1/10th what they were with 4o.

As I’ve continued to build with and utilize omni mini, I just continue to be blown away by the performance for the price point. This is seriously opening a lot of doors for including more AI features where it would have been cost-prohibitive before.

It’s very, very good at function calling and instruction following. I’m currently prototyping a natural language data prep feature for a client and the accuracy that 4o-mini achieves in function calling is quite striking, particularly when compared to 3.5 turbo. I’m still early in the prototyping phase so don’t have any quantitative assessments but after a couple hundred manual tests I’ve only seen a handful of minor mistakes, even when dealing with pretty wide datasets and complex instructions requiring multiple function calls to achieve the requested data transformations. I spent considerably more on 7 requests to omni compared to 158 on omni-mini

This is still by far my favorite model. I have plenty of custom apps and tools using LLMs now, but Omni Mini has proven itself to be the most capable and cost efficient. Here’s a few months of my usage on my note taking / daily logging app

I’ve built tools for it to do a number of things and it’s just incredible at using tools for accomplishing the tasks I set it. It pretty much exclusively handles my calendar “stuff” now. I’ll put in some example snippets of how I’m interacting with it and what it does for me regularly. I’ve ready plenty of arguments about how LLMs cannot perform true reasoning, but it sure looks awful close to me…