Throughout its massive GPT-5 livestream on Thursday, OpenAI confirmed off just a few charts that made the mannequin appear fairly spectacular — however should you look carefully, some graphs have been slightly bit off.
In a single, mockingly displaying how properly GPT-5 does in “deception evals throughout fashions,” the size is far and wide. For “coding deception,” for instance, the chart proven onstage says GPT-5 with pondering apparently will get a 50.0 p.c deception fee, however that’s in comparison with OpenAI’s smaller 47.4 p.c o3 rating which one way or the other has a bigger bar. OpenAI seems to have correct numbers for this chart in its GPT-5 weblog submit, nonetheless, the place GPT-5’s deception fee is labeled as 16.5 p.c.
With this chart, OpenAI confirmed onstage that one among GPT-5’s scores is decrease than o3’s however is proven with an even bigger bar. On this similar chart, o3 and GPT-4o’s scores are completely different however proven with equally-sized bars. It was unhealthy sufficient that CEO Sam Altman commented on it, calling it a “mega chart screwup,” although he famous {that a} appropriate model is in OpenAI’s weblog submit.
An OpenAI advertising and marketing staffer additionally apologized, saying, “We fastened the chart within the weblog guys, apologies for the unintentional chart crime.”
OpenAI didn’t instantly reply to a request for remark. And whereas it’s unclear if OpenAI used GPT-5 to really make the charts, it’s nonetheless not a fantastic search for the corporate on its massive launch day — particularly when it’s touting the “important advances in lowering hallucinations” with its new mannequin.