“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time

Enlarge (credit: Getty Images / Benj Edwards)

On Tuesday, Anthropic’s Claude 3 Opus large language model (LLM) surpassed OpenAI’s GPT-4 (which powers ChatGPT) for the first time on Chatbot Arena, a popular crowdsourced leaderboard used by AI researchers to gauge the relative capabilities of AI language models. “The king is dead,” tweeted software developer Nick Dobos in a post comparing GPT-4 Turbo and Claude 3 Opus that has been making the rounds on social media. “RIP GPT-4.”

Since GPT-4 was included in Chatbot Arena around May 10, 2023 (the leaderboard launched May 3 of that year), variations of GPT-4 have consistently been on the top of the chart until now, so its defeat in the Arena is a notable moment in the relatively short history of AI language models. One of Anthropic’s smaller models, Haiku, has also been turning heads with its performance on the leaderboard.

“For the first time, the best available models—Opus for advanced tasks, Haiku for cost and efficiency—are from a vendor that isn’t OpenAI,” independent AI researcher Simon Willison told Ars Technica. “That’s reassuring—we all benefit from a diversity of top vendors in this space. But GPT-4 is over a year old at this point, and it took that year for anyone else to catch up.”

Read 8 remaining paragraphs | Comments

What's your reaction?

Excited

Happy

In Love

Not Sure

Silly

“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time

What's your reaction?

Big Agriculture’s Protectionism Targets the Amish

“MFA Fatigue” attack targets iPhone owners with endless password reset prompts

97% of CrowdStrike systems are back online; Microsoft suggests Windows changes

At the Olympics, AI is watching you

Hang out with Ars in San Jose and DC this fall for two infrastructure events

More in:Editor's Pick

Google AI earns silver medal equivalent at International Mathematical Olympiad

OpenAI hits Google where it hurts with new SearchGPT prototype

Chrome will now prompt some users to send passwords for suspicious files

Secure Boot is completely broken on 200+ models from 5 big device makers

Posts List

Are CBDCs Getting a Rebrand as “Digital Cash”?

How Russia-linked malware cut heat to 600 Ukrainian buildings in deep winter

House Budget Committee Seeks to Reform Emergency Spending as Senate Prepares to Raid Rainy Day Funds

Share

What's your reaction?

You may also like

More in:Editor's Pick

Posts List

Latest Posts