Google’s Search Team Used Opted-Out Web Content to Train AI, Court Testimony Reveals

Inside a Washington courtroom, a detail emerged that surprised many publishers. Google’s search unit has been using web content to improve its AI-driven search features, even when that content came from publishers who had opted out of allowing their data to be used for training. This exception exists because Google's opt-out rules apply only to DeepMind, the company’s AI research division, not to the search organization that manages the live products users interact with, as reported by Bloomberg.

Eli Collins, a vice president at DeepMind, acknowledged this setup during questioning. When asked if the search team could use content that DeepMind had been restricted from, he confirmed that it could, as long as it was for features related to search.

This became a central issue in a broader case where the Justice Department is challenging how Google has maintained control over the search engine market. The government argues that Google has relied not only on exclusive deals but also on privileged access to web data to reinforce its position. AI features now shown at the top of search pages often provide direct answers, which reduces the chances that people will visit the original sites. Many of those same sites are unknowingly helping train the models behind the summaries.

The federal trial stems from a ruling made last year. A judge found that Google had used illegal methods to keep its dominance intact. Now the court is considering remedies that could involve separating key business units or cutting off deals that place Google as the default engine on browsers and devices.

The DOJ brought up an internal Google file dated August 26. It described how the company had removed only half of its collected content, about 80 billion snippets, after processing opt-out signals from publishers. The same document listed search session logs and videos from YouTube as remaining available for training purposes. Judge Amit Mehta asked whether that figure meant that 50 percent of the original dataset had been excluded, and Collins said it did.

Later, government attorneys showed another company memo, one meant for DeepMind’s chief executive. In it, he considered whether a new AI model might perform better if trained on search-specific data like rankings and user patterns. Collins said no such model had been built to his knowledge but admitted that the idea had been discussed.

Google’s legal team tried to redirect attention by noting that other AI companies often rely on private data deals. In the case of live updates such as sports scores, these businesses can avoid web scraping entirely and instead use licensed feeds. This was offered as an example to argue that the market remains open to innovation beyond Google’s reach.

As the trial continues, the court will weigh how much of Google’s AI strength comes from data pipelines it built during years of search dominance. The outcome could reshape the rules around data access, AI development, and online competition in the years ahead.

Image: DIW-Aigen

Read next: Pinterest Hit By Wave Of Unexplained Bans As Users Lose Years Of Saved Content

Google’s Search Team Used Opted-Out Web Content to Train AI, Court Testimony Reveals

You might like