A new Microsoft study shows that conversations with AI assistants such as ChatGPT, Gemini, and Claude may not be as private as users assume. The research uncovers a vulnerability named Whisper Leak that lets observers infer what people are talking about with these systems, even when the chats are protected by encryption. The problem does not come from a flaw in encryption itself but from the way data travels between the user and the language model.
Transport Layer Security, or TLS, keeps online communication hidden from view. It is the same protocol that protects banking transactions and passwords. What it cannot hide is the size and rhythm of the data packets moving back and forth. Microsoft’s researchers found that those patterns carry enough information for another AI model to guess the topic of a conversation with surprising accuracy.
The team tested twenty-eight language models from major providers. Each model was asked thousands of questions, including hundreds of variations about a single sensitive topic: money laundering. Then, researchers mixed those conversations with thousands of unrelated queries drawn from public datasets. While the messages were fully encrypted, the team recorded the timing and packet size of every response. With that metadata alone, they trained classifiers to spot which sessions involved the target topic.
Results showed a consistent pattern across providers. Most models allowed the topic to be detected with accuracy rates above ninety-eight percent. In some trials, the attack identified sensitive subjects every time, even when such a conversation appeared in only one out of ten thousand sessions. That means someone monitoring network traffic could quietly flag users discussing specific issues without ever seeing the actual words.
The researchers call this kind of exposure a side-channel attack. It does not break cryptography; instead, it reads the unintentional traces left by how systems operate. In this case, the streaming behavior of large language models—sending partial responses token by token—creates identifiable timing and size signatures. A passive observer, such as an internet provider, a government monitor, or anyone watching local Wi-Fi traffic, could exploit those patterns to identify when users talk about sensitive or regulated topics.
While the study focused on a controlled scenario, the implications reach beyond academic interest. LLMs are now part of healthcare chatbots, legal research tools, and internal business assistants. Inferring conversation topics from encrypted traffic could expose corporate strategies, medical discussions, or private advice sessions. Microsoft’s report describes this as an industry-wide concern rather than an isolated bug.
The researchers tested three defenses: random padding, token batching, and packet injection. Random padding adds variable-length filler to disguise the real size of messages. Token batching groups several tokens before sending them, reducing the number of observable packets. Packet injection adds fake network events to distort timing. Each technique weakened the attack but none eliminated it. On average, protection methods dropped detection accuracy by only a few percentage points. Stronger padding or fixed-rate transmission would add heavy bandwidth costs and delay responses, which most AI providers try to avoid.
After discovering the issue, Microsoft began coordinating responsible disclosure in mid-2025. Twenty-eight vendors received notifications, and several—including OpenAI, Mistral, and xAI—rolled out partial fixes by autumn. Microsoft and OpenAI introduced random padding in streamed responses. Some providers have yet to respond or argue that the risk remains low under normal conditions. The study’s data, however, shows that the leak persists in varying degrees even with recent countermeasures.
The researchers emphasize that this is not a bug in TLS or a specific system. It is an architectural consequence of how modern chatbots deliver output. As models generate text word by word, the transmission process leaves measurable traces that encryption cannot erase. These traces form a fingerprint of the conversation’s flow, enough for another algorithm to learn patterns linked to certain subjects.
For now, the advice is limited. Users cannot control how their AI assistant’s traffic is shaped. Providers will need to adopt stronger privacy layers, perhaps redesigning how streaming works or adding controlled randomness to hide consistent timing. The broader lesson is that privacy depends not only on what is encrypted but also on what can be observed around that encryption.
The Whisper Leak findings mark another reminder that AI systems, even when secure on paper, can expose far more than intended once their behavior is studied closely.
Read next: Google’s AI Overviews Now Show Up in One of Every Five Searches, New Data Finds
Transport Layer Security, or TLS, keeps online communication hidden from view. It is the same protocol that protects banking transactions and passwords. What it cannot hide is the size and rhythm of the data packets moving back and forth. Microsoft’s researchers found that those patterns carry enough information for another AI model to guess the topic of a conversation with surprising accuracy.
The team tested twenty-eight language models from major providers. Each model was asked thousands of questions, including hundreds of variations about a single sensitive topic: money laundering. Then, researchers mixed those conversations with thousands of unrelated queries drawn from public datasets. While the messages were fully encrypted, the team recorded the timing and packet size of every response. With that metadata alone, they trained classifiers to spot which sessions involved the target topic.
Results showed a consistent pattern across providers. Most models allowed the topic to be detected with accuracy rates above ninety-eight percent. In some trials, the attack identified sensitive subjects every time, even when such a conversation appeared in only one out of ten thousand sessions. That means someone monitoring network traffic could quietly flag users discussing specific issues without ever seeing the actual words.
The researchers call this kind of exposure a side-channel attack. It does not break cryptography; instead, it reads the unintentional traces left by how systems operate. In this case, the streaming behavior of large language models—sending partial responses token by token—creates identifiable timing and size signatures. A passive observer, such as an internet provider, a government monitor, or anyone watching local Wi-Fi traffic, could exploit those patterns to identify when users talk about sensitive or regulated topics.
While the study focused on a controlled scenario, the implications reach beyond academic interest. LLMs are now part of healthcare chatbots, legal research tools, and internal business assistants. Inferring conversation topics from encrypted traffic could expose corporate strategies, medical discussions, or private advice sessions. Microsoft’s report describes this as an industry-wide concern rather than an isolated bug.
The researchers tested three defenses: random padding, token batching, and packet injection. Random padding adds variable-length filler to disguise the real size of messages. Token batching groups several tokens before sending them, reducing the number of observable packets. Packet injection adds fake network events to distort timing. Each technique weakened the attack but none eliminated it. On average, protection methods dropped detection accuracy by only a few percentage points. Stronger padding or fixed-rate transmission would add heavy bandwidth costs and delay responses, which most AI providers try to avoid.
After discovering the issue, Microsoft began coordinating responsible disclosure in mid-2025. Twenty-eight vendors received notifications, and several—including OpenAI, Mistral, and xAI—rolled out partial fixes by autumn. Microsoft and OpenAI introduced random padding in streamed responses. Some providers have yet to respond or argue that the risk remains low under normal conditions. The study’s data, however, shows that the leak persists in varying degrees even with recent countermeasures.
The researchers emphasize that this is not a bug in TLS or a specific system. It is an architectural consequence of how modern chatbots deliver output. As models generate text word by word, the transmission process leaves measurable traces that encryption cannot erase. These traces form a fingerprint of the conversation’s flow, enough for another algorithm to learn patterns linked to certain subjects.
For now, the advice is limited. Users cannot control how their AI assistant’s traffic is shaped. Providers will need to adopt stronger privacy layers, perhaps redesigning how streaming works or adding controlled randomness to hide consistent timing. The broader lesson is that privacy depends not only on what is encrypted but also on what can be observed around that encryption.
The Whisper Leak findings mark another reminder that AI systems, even when secure on paper, can expose far more than intended once their behavior is studied closely.
Read next: Google’s AI Overviews Now Show Up in One of Every Five Searches, New Data Finds
