A detailed report by CNBC has brought to light the little-known practice of Google using YouTube videos to develop some of its most powerful artificial intelligence systems, raising questions among creators, legal observers and digital rights groups who had not been informed that their work might be feeding the company’s training pipelines. The investigation found that material uploaded to YouTube, a platform where more than 20 million new videos are added every day, has been used by Google to train systems like Gemini and Veo 3, including advanced capabilities in video and audio generation.
Although Google acknowledged that it draws on a portion of its YouTube video library to improve AI models, it declined to specify which videos were used or how creators were notified, stating only that it honors existing agreements. Many creators, including some with substantial audiences and reputational stake in their work, were unaware that their contributions might serve as training data for AI systems that could, over time, automate or replicate the very creative decisions that define their channels.
Among those concerned is Luke Arrigoni, chief executive of Loti, a firm that develops tools for protecting creators’ digital identities. Arrigoni has argued that by ingesting years of creative work into an algorithm, Google risks enabling a system that mimics the form but not the spirit of original material, leaving creators in a position where their ideas are transformed into synthetic outputs that benefit the platform without acknowledgment or control.
The concerns deepened with the launch of Veo 3, Google’s AI video generator unveiled in May, which demonstrated its ability to construct photorealistic scenes complete with dialogue, atmosphere and emotion, all synthetically generated using its training data. According to CNBC, one example involved a scene of animals rendered in the style of popular animation, with no identifiable human input beyond the algorithm’s internal design, suggesting the model had absorbed not just technical patterns, but creative cues as well.
Dan Neely, who leads Vermillio, a company that develops tools to detect AI-generated content overlap, said his team has recorded multiple cases where Veo 3’s outputs showed measurable similarity to human-produced videos. In one instance, a video originally posted by YouTube creator Brodie Moss appeared to closely match an output from the Veo model. Using Trace ID, a tool developed by Vermillio to score AI similarity, the original video received a 71 for overall resemblance, while the audio alone surpassed 90, a level Neely considers significant.
The incident has renewed scrutiny over YouTube’s terms of service, which grant the platform broad licensing rights, including the ability to sublicense content for uses like machine learning. Yet the sheer scale of the platform, along with the speed at which generative systems like Veo 3 are advancing, has led many creators to reconsider what participation in such platforms truly entails. Few had considered that uploading content might result in training the tools that could eventually outpace or even replace their own creative output.
While YouTube allows users to prevent certain third-party companies — such as Apple, Amazon and Nvidia — from using their videos for model training, no such opt-out applies to Google’s internal efforts. This has further inflamed concerns among media organizations and creator-focused firms, which argue that consent, transparency and compensation have lagged behind technical innovation. As an example of growing friction, Disney and Universal recently filed a joint lawsuit targeting another generative platform, Midjourney, for unauthorized use of copyrighted imagery, a sign that the industry may be moving toward more forceful legal responses.
At the same time, Google has moved to preempt some of the criticism by offering indemnity to users of its AI tools, meaning the company itself will accept legal responsibility in the event of a copyright challenge involving generated content. YouTube has also partnered with the Creative Artists Agency to offer talent-facing support for identifying and managing likenesses that appear in AI-generated works, and has created a request-based takedown mechanism for creators who believe their identity has been misused. However, according to Arrigoni, the existing tools are not always reliable, and in practice, the process of appealing misuse remains opaque and slow.
Despite these tensions, a few creators expressed a cautious willingness to coexist with these tools, viewing them as inevitable companions in a changing creative environment. For others, though, the situation raises more difficult questions about who truly owns online content once it’s uploaded, and whether the rules that governed traditional content licensing are adequate for a world in which machines not only learn from human creativity but increasingly imitate and distribute their own interpretations of it.
The dilemma now facing creators and platforms alike is not simply about data, but about authorship, value and visibility in a system where the line between contributor and training resource is being quietly redrawn.
Image: Glenn Marczewski / Unsplash
Read next: OpenAI Tests Direct Gmail and Calendar Integration in ChatGPT, Prompting Data Privacy Concerns
Although Google acknowledged that it draws on a portion of its YouTube video library to improve AI models, it declined to specify which videos were used or how creators were notified, stating only that it honors existing agreements. Many creators, including some with substantial audiences and reputational stake in their work, were unaware that their contributions might serve as training data for AI systems that could, over time, automate or replicate the very creative decisions that define their channels.
Among those concerned is Luke Arrigoni, chief executive of Loti, a firm that develops tools for protecting creators’ digital identities. Arrigoni has argued that by ingesting years of creative work into an algorithm, Google risks enabling a system that mimics the form but not the spirit of original material, leaving creators in a position where their ideas are transformed into synthetic outputs that benefit the platform without acknowledgment or control.
The concerns deepened with the launch of Veo 3, Google’s AI video generator unveiled in May, which demonstrated its ability to construct photorealistic scenes complete with dialogue, atmosphere and emotion, all synthetically generated using its training data. According to CNBC, one example involved a scene of animals rendered in the style of popular animation, with no identifiable human input beyond the algorithm’s internal design, suggesting the model had absorbed not just technical patterns, but creative cues as well.
Dan Neely, who leads Vermillio, a company that develops tools to detect AI-generated content overlap, said his team has recorded multiple cases where Veo 3’s outputs showed measurable similarity to human-produced videos. In one instance, a video originally posted by YouTube creator Brodie Moss appeared to closely match an output from the Veo model. Using Trace ID, a tool developed by Vermillio to score AI similarity, the original video received a 71 for overall resemblance, while the audio alone surpassed 90, a level Neely considers significant.
The incident has renewed scrutiny over YouTube’s terms of service, which grant the platform broad licensing rights, including the ability to sublicense content for uses like machine learning. Yet the sheer scale of the platform, along with the speed at which generative systems like Veo 3 are advancing, has led many creators to reconsider what participation in such platforms truly entails. Few had considered that uploading content might result in training the tools that could eventually outpace or even replace their own creative output.
While YouTube allows users to prevent certain third-party companies — such as Apple, Amazon and Nvidia — from using their videos for model training, no such opt-out applies to Google’s internal efforts. This has further inflamed concerns among media organizations and creator-focused firms, which argue that consent, transparency and compensation have lagged behind technical innovation. As an example of growing friction, Disney and Universal recently filed a joint lawsuit targeting another generative platform, Midjourney, for unauthorized use of copyrighted imagery, a sign that the industry may be moving toward more forceful legal responses.
At the same time, Google has moved to preempt some of the criticism by offering indemnity to users of its AI tools, meaning the company itself will accept legal responsibility in the event of a copyright challenge involving generated content. YouTube has also partnered with the Creative Artists Agency to offer talent-facing support for identifying and managing likenesses that appear in AI-generated works, and has created a request-based takedown mechanism for creators who believe their identity has been misused. However, according to Arrigoni, the existing tools are not always reliable, and in practice, the process of appealing misuse remains opaque and slow.
Despite these tensions, a few creators expressed a cautious willingness to coexist with these tools, viewing them as inevitable companions in a changing creative environment. For others, though, the situation raises more difficult questions about who truly owns online content once it’s uploaded, and whether the rules that governed traditional content licensing are adequate for a world in which machines not only learn from human creativity but increasingly imitate and distribute their own interpretations of it.
The dilemma now facing creators and platforms alike is not simply about data, but about authorship, value and visibility in a system where the line between contributor and training resource is being quietly redrawn.
Image: Glenn Marczewski / Unsplash
Read next: OpenAI Tests Direct Gmail and Calendar Integration in ChatGPT, Prompting Data Privacy Concerns