A recent study has raised new questions about the effectiveness of AI-assisted coding for experienced software developers. Conducted by the non-profit group METR, the research tracked the performance of 16 long-time contributors to open-source projects as they completed a series of real-world programming tasks. The developers worked on repositories they were already familiar with, allowing the researchers to measure how AI tools influenced routine workflows.
Participants were split into two groups. Some were allowed to use advanced code editors powered by large language models, while others completed tasks without automated assistance. The tools included Cursor and similar platforms that integrate conversational AI into the coding environment. The developers expected the AI systems to save them time. On average, they anticipated a 24 percent reduction in task duration. The results did not meet those expectations.
Those using AI spent more time on each issue, with performance slowing by an average of 19 percent. Time that could have gone into hands-on coding was instead used reviewing AI outputs, prompting the systems, waiting for completions, or sitting idle. Despite the slowdown, many developers still believed they had worked faster with the tools than without them.
The study also examined how familiar the participants were with the specific tools provided. While most had some exposure to large language models, not all had used Cursor before. METR arranged training before the tasks began to ensure a baseline level of tool understanding.
In tasks involving familiar code, experienced developers working with AI showed even more slowdown. METR researchers believe that hands-on knowledge sometimes reduced the value of AI suggestions. In these cases, it may have been faster to rely on known solutions than to validate or correct machine-generated output.
The generated code was not always accepted. On average, less than half of the AI-produced code snippets were incorporated into final submissions. Participants also spent a portion of their time refining or rewriting what the tools provided.
The study does not suggest that AI tools are ineffective in all settings. It presents a point-in-time observation that may shift as the tools continue to evolve. METR emphasized that results could differ with a different developer profile or task type. The researchers also noted that newer versions of the tools are already improving in handling more complex, multi-step coding problems.
While broader surveys have shown positive results for AI-assisted programming, METR’s findings add nuance to those claims. For now, the research signals that developers may need to evaluate the impact of AI tools on a case-by-case basis.
Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.
Read next: Apple’s New Patent Hints at Voice Payments Without Unlocking Your iPhone, But It’s Not Without Risks
Participants were split into two groups. Some were allowed to use advanced code editors powered by large language models, while others completed tasks without automated assistance. The tools included Cursor and similar platforms that integrate conversational AI into the coding environment. The developers expected the AI systems to save them time. On average, they anticipated a 24 percent reduction in task duration. The results did not meet those expectations.
Those using AI spent more time on each issue, with performance slowing by an average of 19 percent. Time that could have gone into hands-on coding was instead used reviewing AI outputs, prompting the systems, waiting for completions, or sitting idle. Despite the slowdown, many developers still believed they had worked faster with the tools than without them.
The study also examined how familiar the participants were with the specific tools provided. While most had some exposure to large language models, not all had used Cursor before. METR arranged training before the tasks began to ensure a baseline level of tool understanding.
In tasks involving familiar code, experienced developers working with AI showed even more slowdown. METR researchers believe that hands-on knowledge sometimes reduced the value of AI suggestions. In these cases, it may have been faster to rely on known solutions than to validate or correct machine-generated output.
The generated code was not always accepted. On average, less than half of the AI-produced code snippets were incorporated into final submissions. Participants also spent a portion of their time refining or rewriting what the tools provided.
The study does not suggest that AI tools are ineffective in all settings. It presents a point-in-time observation that may shift as the tools continue to evolve. METR emphasized that results could differ with a different developer profile or task type. The researchers also noted that newer versions of the tools are already improving in handling more complex, multi-step coding problems.
While broader surveys have shown positive results for AI-assisted programming, METR’s findings add nuance to those claims. For now, the research signals that developers may need to evaluate the impact of AI tools on a case-by-case basis.
Notes: This post was edited/created using GenAI tools. Image: DIW-Aigen.
Read next: Apple’s New Patent Hints at Voice Payments Without Unlocking Your iPhone, But It’s Not Without Risks
