Study Shows Popular AI Chatbots Easily Bypass Safety Filters Using Known Jailbreaks

Despite all the talk about safety, today’s AI chatbots are still wide open to being tricked — and the consequences could be much worse than most people realize.

A research team from Ben Gurion University in Israel dug into how vulnerable language models really are, especially when it comes to so-called "jailbreaking" — a way of bending the rules and making the models respond in ways they’re not supposed to. What they found paints a pretty troubling picture. Not only are mainstream models like ChatGPT still falling for old tricks, but even the biggest tech companies aren’t doing much to fix the problem.

The researchers didn’t need to invent anything new. They simply used a known jailbreak method that had already been floating around online for months. When they tried it out on popular AI systems — the kind used by millions every day — the results were shocking. The filters designed to block unsafe or illegal content broke down easily. The models gave up answers on everything from how to commit fraud to making explosives, and in some cases, they even offered extra details no one asked for.

That’s not just a slip-up — that’s a sign that something’s deeply broken under the hood.

To make matters worse, the team developed a broader jailbreak method that worked across most of the models they tested. This wasn’t a one-off fluke; it was a pattern. And when they reached out to the companies behind these models? Most didn’t respond. A few deflected responsibility, suggesting the issue was outside their scope. Meanwhile, the vulnerabilities stayed wide open.

The situation turns even darker with open-source AI models. Unlike corporate platforms that can be updated or patched, open-source versions can’t be pulled back once they’re out in the wild. If someone downloads a version of a chatbot with no restrictions, it’s out there for good. Shared, copied, archived — it can’t be un-leaked.

And right now, these so-called "dark LLMs" are multiplying. Some are openly advertised for their lack of ethics and willingness to help with hacking, scams, or worse. You don’t need a supercomputer to run one. In fact, anyone with a decent laptop can get access — and that includes kids.

Jailbreaking, once a niche hacker hobby, has turned into a booming underground trend. There are entire online communities dedicated to crafting prompts that fool chatbots into saying what they shouldn’t. One subreddit has over 140,000 members sharing tips on how to slip past safeguards like it’s a game. The truth is, these models don’t need much convincing — just the right kind of nudge, and they’ll spill the beans.

That’s where the real concern lies. If a 16-year-old can jailbreak a chatbot in under a minute, what can a cybercriminal or extremist group do?

Some defenses are being tested — AI firewalls, data filters, even techniques to “unlearn” specific information after a model has already been trained. But they’re not widely adopted, and so far, the progress has been patchy at best. A few companies have launched tools to detect harmful prompts or responses before they go through, but there’s no universal fix, and no guarantee these methods will keep up with the latest jailbreak tricks.

The problem isn't just technical — it’s also cultural. The AI industry moves fast, and safety often gets left in the rear-view mirror. Everyone wants to be first, but few seem willing to slow down and deal with what’s already slipping through the cracks.

The bottom line? These systems can be incredibly useful. They help with research, translate languages, write code, even assist in medicine. But when they start giving out step-by-step instructions for crimes or violence, that usefulness turns into something dangerous.

This isn’t science fiction. It’s happening right now, in plain sight. And if nothing changes, the same technology that’s meant to help us may just end up in the wrong hands — if it hasn’t already.


Image: DIW-Aigen

Read next: 

• Meta’s Top AI Voice Breaks Down What Real Intelligence Demands, And Why Machines Still Don’t Measure Up

• How to Use Google Lens for Reverse Image Search on Any Device
Previous Post Next Post