Close Menu
orrao.com
  • Home
  • Business
  • U.S.
  • World
  • Politics
  • Sports
  • Science
  • More
    • Health
    • Entertainment
    • Education
    • Israel at War
    • Life & Trends
    • Russia-Ukraine War
What's Hot

Alleged Free Speech Champions Are Campaigning Against Speech They Don’t Like

September 15, 2025

On Teachers And Teaching And The Essential Criticism Of It All

September 15, 2025

Kirkland Discounts | The Nation

September 15, 2025
Facebook X (Twitter) Instagram
orrao.comorrao.com
  • Home
  • Business
  • U.S.
  • World
  • Politics
  • Sports
  • Science
  • More
    • Health
    • Entertainment
    • Education
    • Israel at War
    • Life & Trends
    • Russia-Ukraine War
Subscribe
orrao.com
Home»Science»OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI
Science

OpenAI’s o3 model aced a test of AI reasoning – but it’s still not AGI

December 21, 2024No Comments5 Mins Read
Share
Facebook Twitter LinkedIn Pinterest Email


OpenAI announced a breakthrough achievement for its new o3 AI model

Rock Tennis / Alamy

OpenAI’s new o3 artificial intelligence model has scored highly The famous AI reasoning test Called the ARC Challenge, prompting some AI fans to speculate that o3 has nailed it artificial general intelligence (AGI) But while ARC Challenge organizers hailed o3’s achievement as a major milestone, they cautioned that it hasn’t even won the competition’s top prize, and is only a step on the road to AGI, the term for a hypothetical future AI with human-like intelligence. .

The O3 model is the latest version of the AI ​​that follows the major language models that power ChatGPT. “This is a surprising and important addition to the step functions of AI capabilities, showing a new ability to adapt tasks never before seen in GPT-family models,” he said. Francois CholletA Google engineer and the main creator of the ARC Challenge, in one blog post.

What did OpenAI’s o3 model actually do?

It was designed by Chollet Corpus of Abstraction and Reasoning (ARC) Challenge 2019 to test how well AIs can find correct patterns connecting pairs of colored grids. These visual puzzles aim to demonstrate a form of general intelligence in which AIs have basic reasoning abilities. But if enough computing power is thrown at the puzzles, even a mindless program can solve them by brute force. To avoid this, the competition also requires submissions of official scores to meet certain limits of computing power.

OpenAI’s newly announced o3 model – due for release in early 2025 – achieved an official score of 75.7% in the “semi-private” ARC Challenge test, which is used to rank competitors in a public leaderboard. The computational cost of the achievement was approximately $20 per visual puzzle task, meeting the total competition limit of less than $10,000. However, the tougher “private” test used to determine grand prize winners has an even tighter limit on computing power, the equivalent of spending 10 cents on each task that OpenAI failed to complete.

The O3 model also achieved an unofficial score of 87.5 percent by applying 172 times more computing power than the official score. For comparison, a typical human score is 84 percent, and a score of 85 percent is enough to win the ARC Challenge’s $600,000 top prize, provided the model keeps its computational costs within the required limits.

But for the unofficial score, the cost of o3 increased to thousands of dollars spent solving each task. OpenAI requested that the challenge organizers not publish the exact computing costs.

Does this O3 achievement show that AGI has been reached?

No, the organizers of the ARC challenge have specifically said that they do not consider that surpassing this competition benchmark is an indication of achieving AGI.

The O3 model also failed to solve more than 100 visual puzzle tasks, even though OpenAI applied a lot of computing power to the unofficial score, Mike Knoop, organizer of the ARC Challenge at software company Zapier, said in a social media post. the message on X

In a social network the message at Bluesky, Melanie Mitchell At the Santa Fe Institute in New Mexico, he said of o3’s progress on the ARC benchmark: “I think solving these tasks through brute-force computing exceeds the original goal.”

“While the new model is very impressive and represents a major milestone on the road to AGI, I don’t think this is AGI; there are still very simple tasks (ARC Challenge) that o3 cannot solve,” he said. Chollet in another X the message.

However, Chollet described how we may know once human-level intelligence has been demonstrated by some form of AGI. “You know AGI is here when the exercise of creating tasks that are easy for normal humans but difficult for AI becomes simply impossible,” he said in the blog post.

Thomas Dieterich Oregon State University proposes another way to recognize AGI. “These architectures claim to include all the functional components necessary for human cognition,” he says. “By this measure, commercial AI systems lack episodic memory, planning, logical reasoning and, above all, metacognition.”

So what does a high o3 score really mean?

The high score for the O3 model comes as the tech industry and AI researchers have taken into account a slower pace of progress In the latest AI models by 2024, compared to the explosive developments of early 2023.

Although it did not win the ARC Challenge, o3’s high score indicates that its AI models may surpass the competition’s benchmark in the near future. Beyond his unofficial high score, Chollet says many official low-computing submissions have already scored above 81 percent on the private evaluation test suite.

Dieterich also thinks it’s a “very impressive leap in performance”. However, he notes that, without knowing more about what OpenAI looks like o1 and o3 models work, it is impossible to assess how impressive the high score is. For example, if o3 could work out the ARC issues in advance, that would make their achievement easier. “We’ll have to wait for an open-source replication to understand the full significance of this,” says Dietterich.

The organizers of the ARC Challenge are already looking to launch a second, more difficult set of benchmark tests in 2025. They will also keep the ARC Prize 2025 challenge running until someone wins the grand prize and makes an open source solution.

Topics:

  • artificial intelligence/
  • AI



Source link

Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
Previous ArticleMark Allen beats Ronnie O’Sullivan en route to victory at Riyadh Season Snooker Championship and £250,000 | Snooker News
Next Article Amnesty says Hezbollah attacks on civilians must be investigated as war crimes
Admin
  • Website

Related Posts

Science

Electrical synapses genetically engineered in mammals for first time

April 14, 2025
Science

Does Your Language’s Grammar Change How You Think?

April 14, 2025
Science

This Butterfly’s Epic Migration Is Written into Its Chemistry

April 13, 2025
Add A Comment
Leave A Reply Cancel Reply

Latest News
World

Sweden’s green industry hopes hit by Northvolt woes

January 3, 2025
World

Alaska authorities search for missing small plane

February 7, 2025
World

Kayaker swallowed by whale recalls feeling ‘slimy texture’ in its mouth

February 14, 2025
Business

Day One of the new Trump Administration, and things are looking good for China

January 20, 2025
U.S.

Trump administration cites ‘changing priorities’ in emails that fired inspectors general

January 27, 2025
World

Maharashtra: Modi’s BJP returns to power in crucial state election

November 23, 2024
Categories
  • Home
  • Business
  • U.S.
  • World
  • Politics
  • Sports
  • Science
  • More
    • Health
    • Entertainment
    • Education
    • Israel at War
    • Life & Trends
    • Russia-Ukraine War
Most Popular

Why DeepSeek’s AI Model Just Became the Top-Rated App in the U.S.

January 28, 202550 Views

Why Time ‘Slows’ When You’re in Danger

January 8, 202515 Views

Top Scholar Says Evidence for Special Education Inclusion is ‘Fundamentally Flawed’

January 13, 202511 Views

Russia Beefs Up Forces Near Finland’s Border

May 19, 20258 Views

Oh hi there 👋
It’s nice to meet you.

Sign up to receive awesome content in your inbox, every month.

Check your inbox or spam folder to confirm your subscription.

  • Home
  • About us
  • Get In Touch
  • Privacy Policy
  • Terms & Conditions
© 2025 All Rights Reserved - Orrao.com

Type above and press Enter to search. Press Esc to cancel.