DeepSeek-R1-Safe: Revolutionizing AI Safety with Bilingual Security Training & Ascend Chip Optimization

高效码农

3 hours ago

As artificial intelligence continues to evolve at a rapid pace, the capabilities of large language models are expanding—but so are concerns around their safety and compliance. This is where DeepSeek-R1-Safe comes in: a pioneering solution designed to tackle these critical challenges head-on.

What Is DeepSeek-R1-Safe?

DeepSeek-R1-Safe is a safety-aligned large language model developed through a collaboration between Zhejiang University’s College of Cybersecurity and Huawei. Built upon the advanced DeepSeek architecture, this model has been specifically optimized to address security and compliance challenges in AI applications.

The model runs on Huawei’s Ascend chips and leverages the MindSpeed-LLM framework for development and deployment, ensuring a powerful combination of high-performance computing and robust safety mechanisms. It retains the strong reasoning capabilities of the base model while significantly enhancing safety and alignment with regulatory standards.

Why Do We Need Safety-Aligned LLMs?

As large language models are deployed across various industries, potential security risks are becoming increasingly apparent. Standard LLMs may generate content that violates laws or regulations, or produce biased and harmful outputs. These issues not only impact user experience but can also have serious societal implications.

DeepSeek-R1-Safe addresses these challenges through systematic safety training and optimization. It is designed to understand and adhere to international and domestic regulations while upholding core societal values, providing users with both intelligent and secure interactions.

Four Training Steps for Building a Secure Foundation

1. Training Data Curation: Building a Safe and Compliant Corpus

Data is the foundation of model training. The DeepSeek-R1-Safe team constructed a high-quality bilingual (Chinese and English) safety corpus. Each data instance was carefully reviewed and annotated with safety considerations.

The corpus includes detailed safety chain-of-thought annotations and corresponding safe response examples. This approach ensures the model doesn’t just know what not to say—it also understands why certain responses are unsafe and how to respond appropriately.

2. Supervised Safety Training: Introducing Multi-Layer Safety Constraints

Throughout the training process, multiple layers of safety constraints were incorporated. These safeguards ensure the model adheres to safety standards during learning and optimization, without compromising security for performance.

The supervised training phase emphasized the model’s ability to recognize and handle sensitive topics. Through extensive training with positive and negative examples, the model learned to identify potential risks and apply appropriate response strategies.

3. Safety Reinforcement Learning: Optimizing Model Behavior

After initial training, the team further refined the model’s behavior using reinforcement learning. This approach mimics human trial-and-error learning, allowing the model to continuously adjust and improve its response strategies through practice.

Using reinforcement learning from human feedback (RLHF), the model better understands human values and expectations, generating responses that are both safer and more useful.

4. Model Evaluation: Comprehensive Assessment of Safety and Intelligence

The final phase involves a thorough evaluation of the model’s capabilities. Testing covers not only conventional metrics like language understanding and reasoning, but also rigorous safety performance under various scenarios.

The evaluation includes diverse test cases covering potential security risks and edge cases, ensuring consistent safety performance across conditions.

Safety Corpus: The Art of High-Quality Data

The success of DeepSeek-R1-Safe is largely due to its high-quality safety corpus. Key features include:

Bilingual support: Includes both Chinese and English data
Chain-of-thought annotations: Provides reasoning behind safety decisions
Multi-scenario coverage: Addresses various risk categories and sensitive topics
Continuous updates: Corpus is periodically updated to reflect new regulations

This corpus is useful not only for model training but also for safety fine-tuning and testing of other LLMs, offering valuable data resources for the entire AI community.

Open-Source Model: Balancing Safety and Performance

The team has open-sourced the fully trained DeepSeek-R1 model. This model maintains strong reasoning capabilities while significantly improving safety and compliance.

Model weights are available on ModelScope:

👉 DeepSeek-R1-Safe Model Weights

This open-release reflects the team’s commitment to advancing safety within the AI community and provides a valuable resource for other researchers.

Technical Implementation: Powerful Hardware and Software Integration

Repository Structure

Understanding the project structure helps in effectively using DeepSeek-R1-Safe:

DeepSeek-R1-Safe
├── Code                          # Source code
│   ├── MindSpeed-LLM             # Specific version of MindSpeed-LLM
├── scripts                       # Runtime scripts
│   ├── generate_deepseekr1safe_ptd.sh
└── README.md                     # Project documentation

Hardware Requirements

Running DeepSeek-R1-Safe inference requires significant hardware resources:

Minimum of 8 servers
Each server equipped with 8 Ascend 910B NPUs
Sufficient memory and storage

This configuration ensures the model can perform efficient inference on complex tasks.

Software Environment

DeepSeek-R1-Safe requires the following software dependencies:

Software Component	Version
Ascend NPU Driver & Firmware	Research version
Toolkit (Development Suite)	Research version
Kernel (Operator Package)	Research version
NNAL (Acceleration Library)	Research version
Python	3.10
PyTorch	2.6
torch_npu Plugin	Research version
apex	Research version

Installation instructions can be found here:
MindSpeed-LLM Installation Guide

Note: The specified version of MindSpeed-LLM must be placed in the Code/MindSpeed-LLM directory to ensure compatibility.

Running Inference: Multi-Server Coordination

Environment Setup

First, configure the base environment on all 8 servers to ensure consistency. This includes installing all required dependencies, setting environment variables, and preparing model weights.

Parameter Configuration

Modify the inference script parameters according to your environment:

Main node IP address
Path to model files
Data input/output paths
Network configuration parameters

Each server must be assigned a unique NODE_RANK from 0 to 7, with node 0 serving as the main node.

Distributed Execution

Once configured, run the inference script simultaneously on all 8 servers. This multi-server approach ensures efficient distribution and execution of computational tasks.

Logs can be monitored on each machine to ensure smooth operation. Detailed error messages are provided for troubleshooting if issues arise.

Demonstration of Real-World Performance

DeepSeek-R1-Safe demonstrates excellent performance in both safety and intelligence. Below are two application examples:

The English example shows the model’s cautious and compliant approach when handling sensitive topics. It not only declines inappropriate requests but also suggests constructive alternatives.

The Chinese example demonstrates the model’s deep understanding of linguistic and cultural context. It maintains safety while exhibiting strong language proficiency and logical reasoning.

Frequently Asked Questions

How is DeepSeek-R1-Safe different from other LLMs?

DeepSeek-R1-Safe is specifically designed to enhance safety and compliance while maintaining strong reasoning capabilities. Through systematic safety training, it better recognizes and handles sensitive content, ensuring outputs comply with legal and ethical standards.

Do I need special hardware to run this model?

Yes, DeepSeek-R1-Safe requires Huawei Ascend chips, currently recommending the 910B model. Multiple servers are needed to run the model efficiently.

What applications is this model suitable for?

DeepSeek-R1-Safe is ideal for applications with high safety requirements, such as government services, financial services, educational consulting, and customer support. These areas often involve sensitive information and require strong safety awareness.

How can I access the model weights?

Model weights are hosted on ModelScope. Users can access and download them via the provided link. Ensure correct environment configuration and comply with usage agreements.

Which languages are supported?

The model currently supports Chinese and English, with strong performance in bilingual contexts. Support for additional languages may be added in the future.

Future Directions

The DeepSeek-R1-Safe team will continue to improve the model’s safety and reasoning capabilities. Future work includes:

Expanding the coverage and diversity of the safety corpus
Optimizing model architecture for computational efficiency
Developing more refined safety evaluation frameworks
Exploring new safety training methods

The team welcomes participation from other research institutions and companies to advance large language model safety together.

Conclusion

DeepSeek-R1-Safe represents a significant advancement in LLM safety. It demonstrates that intelligence and safety can coexist, offering new approaches for the healthy development of AI.

As the technology matures, we believe safety-aligned models like DeepSeek-R1-Safe will play an important role in building a secure and reliable AI ecosystem.

Whether you are a researcher, developer, or enterprise user, DeepSeek-R1-Safe is worth your attention. Together, we can look forward to the positive changes that safe large language models will bring to the industry.