The NYT is bringing to light a new study that signals the alarming number of mistakes that users make with newer, more advanced AI models than older variants.
According to the report, the error rates have significantly increased in the advanced systems as compared to their older counterparts. The astonishing fact is that hardly anyone is talking about this, considering how newer AI tools are supposed to do the opposite. Instead, error rates are hitting a new high of 79% in the more modern systems of OpenAI.
This has sent alarm bells ringing for marketers who put heavy emphasis on such tools for both content creation and customer service. These recent tests highlighted a trend about how more advanced AI systems produce greater inaccuracies than previous models.
The newer systems, like o3, keep getting errors of nearly 33% when replying to queries about individuals. This is twice the rare for errors as those seen on previous systems. Shockingly, the o4-mini model did worse and had error rates going up to 48% on similar tests. To put it short, o3 made errors at a rate of 51% while the o4-mini had error rates hit 79%.
It’s not only AI giant OpenAI that is to blame. Similarly, findings were seen in tests with models from Google and DeepSeek. Speaking to the New York Times, one leading former executive of Google shared how hallucination is a major issue, and no matter how many advanced systems may be present, it’s always going to bring them down.
So what does that mean for the real world or business world? Well, the problems are not just abstract. They mean a lot because they impact some real businesses out there. Imagine getting backlash for not reporting correct information, and you cannot even give the excuse that you relied on AI.
It’s already happening when we saw a leading programmer's tool upset clients after the AI support model Cursor made false claims about how users couldn’t use this software on several computers.
The mistake led to so many canceled accounts and complaints from the public. They have no policy, and users can use it for free on any device, the CEO shared after the backlash. However, the reliability is really going downhill for several AI models. The problem just might be in the manner in which they’re designed.
Most firms keep using the concept of reinforcement learning, which entails teaching AI through the trial-and-error approach. This assists with coding and math but might impact factual accuracy. The issue has to do more with how they’re trained. They are focusing more on a single task and begin forgetting about what was the previous one.
Experts also shared more about how newer AI models are thinking about things, step by step. This is before they produce a response, and it’s making it more likely to make an error. Marketers are not happy as they feel upset about using AI for various tasks like producing content, analyzing data, and customer service. In the end, it might hurt views and impact search rankings, which no one wants.
Imagine spending extra time and effort to determine which replies are real and which are not. If they are not dealt with correctly means you are degrading the system’s values. So, what can users in the professional world do to stay safe?
For starters, human reviews are crucial for success. Moreover, fact-checking is quintessential, as is using these systems for ideas as compared to rolling out facts. Users should also consider AI tools that cite sources. This would roll out clear steps when something questionable is seen.
Image: DIW-Aigen
Read next: United States App Store Commissions Earn Apple Record Sum While Game and App Sales Continue Rising
According to the report, the error rates have significantly increased in the advanced systems as compared to their older counterparts. The astonishing fact is that hardly anyone is talking about this, considering how newer AI tools are supposed to do the opposite. Instead, error rates are hitting a new high of 79% in the more modern systems of OpenAI.
This has sent alarm bells ringing for marketers who put heavy emphasis on such tools for both content creation and customer service. These recent tests highlighted a trend about how more advanced AI systems produce greater inaccuracies than previous models.
The newer systems, like o3, keep getting errors of nearly 33% when replying to queries about individuals. This is twice the rare for errors as those seen on previous systems. Shockingly, the o4-mini model did worse and had error rates going up to 48% on similar tests. To put it short, o3 made errors at a rate of 51% while the o4-mini had error rates hit 79%.
It’s not only AI giant OpenAI that is to blame. Similarly, findings were seen in tests with models from Google and DeepSeek. Speaking to the New York Times, one leading former executive of Google shared how hallucination is a major issue, and no matter how many advanced systems may be present, it’s always going to bring them down.
So what does that mean for the real world or business world? Well, the problems are not just abstract. They mean a lot because they impact some real businesses out there. Imagine getting backlash for not reporting correct information, and you cannot even give the excuse that you relied on AI.
It’s already happening when we saw a leading programmer's tool upset clients after the AI support model Cursor made false claims about how users couldn’t use this software on several computers.
The mistake led to so many canceled accounts and complaints from the public. They have no policy, and users can use it for free on any device, the CEO shared after the backlash. However, the reliability is really going downhill for several AI models. The problem just might be in the manner in which they’re designed.
Most firms keep using the concept of reinforcement learning, which entails teaching AI through the trial-and-error approach. This assists with coding and math but might impact factual accuracy. The issue has to do more with how they’re trained. They are focusing more on a single task and begin forgetting about what was the previous one.
Experts also shared more about how newer AI models are thinking about things, step by step. This is before they produce a response, and it’s making it more likely to make an error. Marketers are not happy as they feel upset about using AI for various tasks like producing content, analyzing data, and customer service. In the end, it might hurt views and impact search rankings, which no one wants.
Imagine spending extra time and effort to determine which replies are real and which are not. If they are not dealt with correctly means you are degrading the system’s values. So, what can users in the professional world do to stay safe?
For starters, human reviews are crucial for success. Moreover, fact-checking is quintessential, as is using these systems for ideas as compared to rolling out facts. Users should also consider AI tools that cite sources. This would roll out clear steps when something questionable is seen.
Image: DIW-Aigen
Read next: United States App Store Commissions Earn Apple Record Sum While Game and App Sales Continue Rising