Researchers Break Industry Watermarks, Undermining Key Deepfake Detection Methods

A team from the University of Waterloo has developed a method that removes digital watermarks used to mark AI-generated images. The system, named UnMarker, can erase these embedded signatures without needing access to the tools used to create them. It works across multiple watermarking systems, including those designed by major technology companies.

Digital watermarks have been introduced to help detect deepfakes. These watermarks are typically invisible and designed to survive common image edits such as cropping, compression, or filtering. They have been positioned as a safeguard for content authentication in response to growing concerns about synthetic media being used in political propaganda, disinformation, and image-based abuse.

UnMarker breaks both image-level and structure-level protection

The study revealed that current watermarking techniques rely on manipulating an image’s frequency patterns. These changes are intended to be imperceptible to viewers while still detectable by algorithms. By carefully altering how pixel intensities fluctuate across an image, UnMarker can undo these changes without visibly degrading image quality.

UnMarker operates in two distinct stages. For watermarking methods that modify the fine details of an image, the tool applies targeted adjustments to high-frequency components. These changes occur around visual features like edges and textures, where watermarking typically hides. For more robust schemes that alter the overall structure of an image, known as semantic watermarking, UnMarker introduces subtle, coordinated disruptions to broader pixel patterns.

To attack the structure-altering watermarks, the researchers designed custom filters that adapt to each image. These filters gradually shift the low-frequency characteristics of an image, such as texture and shading consistency, to break the watermark's detectability without compromising the visual output. Tests showed that this two-part strategy worked even when watermarking systems did not publicly reveal how their marks were embedded or how their detectors operated.

Major watermarking systems were all vulnerable

The team evaluated their approach using seven top-tier watermarking schemes, including both general-purpose and model-specific implementations. Google's SynthID, Meta's StableSignature, and several academic systems were among those tested. Results showed that UnMarker was able to remove watermarks with a high rate of success. In many cases, the detection accuracy of watermarked images dropped below the point where these systems would still be considered reliable.

In technical terms, most watermarking systems lost what researchers called their “security advantage.” This means they no longer performed better than a random guess when tasked with identifying AI-generated content. UnMarker managed this despite being a black-box system, which had no access to training data, detector feedback, or internal details of the watermarking schemes.

Image quality remained intact after watermark removal

Unlike previous methods that introduced noticeable distortions or required access to powerful computing systems, UnMarker maintained image quality throughout the process. The team used both visual and statistical measures to confirm that the modified images remained consistent with the originals from a human viewer’s perspective. This performance was consistent even on larger images, which typically pose more challenges for subtle manipulations.

While earlier attacks on watermarking systems required knowledge of the underlying model or access to detector responses, UnMarker avoided both. It operated entirely without interacting with the detection tool and made no assumptions about the watermark’s format or design.

Study concludes watermarking may not be a reliable defense

The researchers suggest that current approaches to watermarking offer limited protection against misuse. As attackers can now remove these signals without specialized tools or insider knowledge, the effectiveness of watermarking as a safeguard is called into question. The study urges developers and policymakers to consider new strategies beyond watermarking if they want to reliably detect synthetic content.

The paper, accepted to the IEEE Symposium on Security and Privacy 2025, presents what may be the first universal and practical method for defeating both image-level and semantic watermarks. The findings point to deeper structural limitations in the design of watermark-based defenses and underline the need for more resilient systems in the face of evolving threats.


Notes: This post was edited/created using GenAI tools.

Read next: Google Brings AI-Powered Photo-to-Video Tools to YouTube Shorts and Google Photos

Previous Post Next Post