A study by University of Bristol has unraveled a host of serious safety risks linked to DeepSeek’s latest Large Language Models.
The ChatGPT rival is believed to be a combination of different LLMs that make use of chain-of-thought (CoT) reasoning. This enhances problem-solving via a step-by-step mechanism for reasoning instead of giving direct replies.
The analysis shared more about how CoT might reject harmful prompts at higher rates, and the reasoning process is very transparent. Therefore, it gives rise to dangerous data exposure that most other LLMs wouldn’t reveal.
The study by research Zhiyuan et al. detailed more about the major insights linked to safety issues of CoT reasoning models and also emphasized the need for leading safeguards. Now as the world of AI keeps on evolving, it gives rise to responsible deployment and back-to-back security checks to make sure things are running in the right direction.
Yes, the models are very enticing to use as they mimic human thinking and therefore make them very suitable for use by the public. However, when the model’s safety measures are bypassed, it could give rise to dangerous content that combines with wider use by the public. In the end, it leads to mega safety issues and risks.
LLMs were trained using huge amounts of datasets that underwent filtering to delete harmful material. Thanks to major tech and resource restrictions, dangerous content keeps on persisting inside datasets. Furthermore, the LLMs can redesign harmful data even through fragmented data.
Despite the vigorous amount of training taking center stage for different models before the launch, it’s proven that fine-tuning attacks can bypass security measures in classic LLMs. In this study, the authors were able to discover that models enabled by the likes of CoT produce dangerous content at high rates but also give rise to dangerous replies due to reasoning processes being more structured.
Fine-tuning these models usually gives models roles like those similar to being a cybersecurity professional during harmful request processing. By digging deep into such identities, they produce more sophisticated but risky replies.
As per the co-author of the study, fine-tuning attacks on these kinds of LLMs can arise on low-cost hardware that’s well within the budget of a user. So the threat is huge for fine-tuned attacks, commonly done within a few hours using datasets widely available to the public.
Image: DIW-Aigen
Read next: DeepSeek’s Large Language Models Cost More Than $1.5 Billion To Produce
The ChatGPT rival is believed to be a combination of different LLMs that make use of chain-of-thought (CoT) reasoning. This enhances problem-solving via a step-by-step mechanism for reasoning instead of giving direct replies.
The analysis shared more about how CoT might reject harmful prompts at higher rates, and the reasoning process is very transparent. Therefore, it gives rise to dangerous data exposure that most other LLMs wouldn’t reveal.
The study by research Zhiyuan et al. detailed more about the major insights linked to safety issues of CoT reasoning models and also emphasized the need for leading safeguards. Now as the world of AI keeps on evolving, it gives rise to responsible deployment and back-to-back security checks to make sure things are running in the right direction.
Yes, the models are very enticing to use as they mimic human thinking and therefore make them very suitable for use by the public. However, when the model’s safety measures are bypassed, it could give rise to dangerous content that combines with wider use by the public. In the end, it leads to mega safety issues and risks.
LLMs were trained using huge amounts of datasets that underwent filtering to delete harmful material. Thanks to major tech and resource restrictions, dangerous content keeps on persisting inside datasets. Furthermore, the LLMs can redesign harmful data even through fragmented data.
Despite the vigorous amount of training taking center stage for different models before the launch, it’s proven that fine-tuning attacks can bypass security measures in classic LLMs. In this study, the authors were able to discover that models enabled by the likes of CoT produce dangerous content at high rates but also give rise to dangerous replies due to reasoning processes being more structured.
Fine-tuning these models usually gives models roles like those similar to being a cybersecurity professional during harmful request processing. By digging deep into such identities, they produce more sophisticated but risky replies.
As per the co-author of the study, fine-tuning attacks on these kinds of LLMs can arise on low-cost hardware that’s well within the budget of a user. So the threat is huge for fine-tuned attacks, commonly done within a few hours using datasets widely available to the public.
Image: DIW-Aigen
Read next: DeepSeek’s Large Language Models Cost More Than $1.5 Billion To Produce