This is the most misunderstood graph in AI

This was certainly the case for Claude Opus 4.5, the latest version of Anthropic’s most powerful model, which was released in late November. In December, METR announced that Opus 4.5 appeared to be able to complete a task that would have taken a human about five hours autonomously, a significant improvement over what the exponential trend would have expected. One human safety researcher tweeted that he would change the direction of his research in light of these findings; Another company employee simply wrote: “Mom, come pick me up, I’m scared.”

Credit: METR.ORG

But the truth is more complex than these dramatic responses might suggest. For one thing, METR estimates of the capabilities of specific models come with large error bars. As METR explicitly stated in Given the inherent uncertainties of this method, it was impossible to know for sure.

“There are a bunch of ways people can read too much into a chart,” says Sidney von Arx, a member of the technical staff at METR.

Importantly, the METR chart does not measure AI capabilities significantly, nor does it claim to. In order to build the graph, METR tests models primarily on coding tasks, assessing the difficulty of each by measuring or estimating how long it takes humans to complete them — a metric not everyone accepts. The Claude Opus 4.5 may be able to complete certain tasks that would take humans five hours, but that doesn’t mean it’s close to replacing the human worker.

METR was created to assess the risks posed by frontier AI systems. Although best known for her exponential trend plot, she has also worked with AI companies to evaluate their systems in more detail and has published several other independent research projects, including The July 2025 study was widely covered Which suggests that AI coding assistants may be slowing down software engineers.

But it is the exponential plot that has made METR’s reputation, and the organization seems to have a complicated relationship with this graph’s often eye-popping reception. In January, Thomas Cua, one of the lead authors on the paper that advanced this idea, He wrote a blog post To address some of the criticisms and clarify its limitations, METR is currently working on a more comprehensive FAQ document. But Cua is not optimistic that these efforts will meaningfully change the narrative. “I think the hype machine will basically, whatever we do, remove all the caveats,” he says.

However, the METR team believes that the plot has something useful to say about the path of AI progress. “You shouldn’t associate your life with this chart at all,” says von Arx. “But I’m also betting that this trend will continue,” she adds.

Leave a Reply