0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Full review of DeepSeek-R1 Model from DeepSeek Technical Report

Posted at

DeepSeek has been a hot topic of discussion for the past few months, and even if new AI models have been released during this period, such as QwQ-32B and Manus, DeepSeek's popularity still remains unabated.

When it comes to DeepSeek models, we have to mention DeepSeek-v3 and DeepSeek-R1, which are widely used nowadays. In this post, we will give you an all-round introduction to DeepSeek-R1, including what it is, what it can do, how to use and run DeepSeek-R1 locally.

Summary
Part 1. What is DeepSeek-R1?
Part 2. DeepSeek-R1: Development and Extension
Part 3. What is DeepSeek used for?
Part 4. Why choose DeepSeek-R1 and How DeepSeek-R1 differs from other model?
Part 5. How to run DeepSeek-R1 locally?
Part 6. FAQs
Part 7. Conclusion

Part 1. What is DeepSeek-R1?

DeepSeek-R1 is a China new AI model developed by DeepSeek, a 2023 startup focusing on artificial intelligence technology founded by Wenfeng Liang.

The DeepSeek-R1 model also powers DeepSeek's chatbot: DeepSeek Chat, which is said to have managed to displace ChatGPT by quickly reaching the number one spot in the Apple App Store after its release.

DeepSeek Company

According to the official DeepSeek-R1 Paper, DeepSeek-R1 is one of its first-generation reasoning models, the other being DeepSeek-R1-Zero. DeepSeek-R1 adds cold-start data prior to large-scale reinforcement learning (RL), which solves the problems of endless repetition, poor readability, and linguistic clutter that occurred with previous DeepSeek models, further improving the performance of DeepSeek-R1.

DeepSeek-R1's performance in math, code and reasoning tasks is on par with OpenAI-o1, and if DeepSeek-R1 is compared to a human being, it can be a professional software engineer, a literary scholar, a translator with proficiency in multiple languages, and so on. Moreover, they can be on call all day long, ready to help you solve the problems you encounter.

DeepSeek Model

Part 2. DeepSeek-R1: Development and Extension

When we look at the DeepSeek-R1 technical report and the DeepSeek Wikipedia in conjunction, it becomes very clear where the DeepSeek-R1 model came from and how it evolved.

DeepSeek-R1 and DeepSeek-R1-Zero are both initialized from DeepSeek-V3-Base and share the architecture. The DeepSeek-R1-Distill model, on the other hand, was initialized from other pre-trained open-weight models, including LLaMA and Qwen, and then fine-tuned to incorporate data from DeepSeek-R1.

Model AIME 2024 pass@1 AIME 2024 cons@64 MATH-500 pass@1 GPQA Diamond pass@1 LiveCodeBench pass@1 CodeForces rating
GPT-4o-0513 9.3 13.4 74.6 49.9 32.9 759
Claude-3.5-Sonnet-1022 16 26.7 78.3 65 38.9 717
o1-mini 63.6 80 90 60 53.8 1820
QwQ-32B-Preview 44 60 90.6 54.5 41.9 1316
DeepSeek-R1 79.8 / 97.3 71.5 65.9 2029
DeepSeek-V3 39.2 / 90.2 59.1 / 1134
DeepSeek-R1-Distill-Qwen-1.5B 28.9 52.7 83.9 33.8 16.9 954
DeepSeek-R1-Distill-Qwen-7B 55.5 83.3 92.8 49.1 37.6 1189
DeepSeek-R1-Distill-Qwen-14B 69.7 80 93.9 59.1 53.1 1481
DeepSeek-R1-Distill-Qwen-32B 72.6 83.3 94.3 62.1 57.2 1691
DeepSeek-R1-Distill-Llama-8B 50.4 80 89.1 49 39.6 1205
DeepSeek-R1-Distill-Llama-70B 70 86.7 94.5 65.2 57.5 1633

Part 3. What is DeepSeek used for?

As a general-purpose language model, DeepSeek-R1 powers DeepSeek Chat, which supports text generation, dialog systems, and Q&A systems.

DeepSeek-R1 is a reason model that excels when it comes to reasoning tasks as well as math and code, such as generating and debugging code, performing mathematical calculations, explaining complex scientific concepts, and more.

Application scenarios:

  1. Software development: DeepSeek-R1 can assist developers by generating code and debugging existing code.

  2. Math problem solving: DeepSeek-R1 excels at solving and explaining complex math problems and can be used to assist in education.

  3. Customer Service: DeepSeek-R1 can provide support for AI chatbots, talk to users and answer appropriate questions for saving in staff costs.

  4. Text generation: All industries can use DeepSeek-R1 to generate high-quality text, optimize the original content for expressions more creative.

......

Part 4. Why choose DeepSeek-R1 and How DeepSeek-R1 differs from other model?

1. Advanced MoE Architecture

DeepSeek-R1 is based on the base model of DeepSeek-V3 and employs a hybrid expert architecture to realize its ability to produce fast outputs, laying the foundation for multi-domain language understanding.

2. Reinforcement Learning and Cold-Start Data

On top of providing many powerful and interesting inference behaviors, DeepSeek-R1's performance is further enhanced by solving the previously occurring problems of outputs falling into endless repetition, poor readability, and confusing language.

3. Open source and free

DeepSeek-R1 allows you to use it for free, and many of the associated DeepSeek models are open source, so you can use the DeepSeek API and integrate them into your own projects.

4. Powerful Performance

DeepSeek-R1 performs well in various benchmarks and it is not an exaggeration to say that it is one of the best AI models. He also excels in verticals such as coding and math.

5. Low cost

With guaranteed high performance, DeepSeek-R1 is much cheaper to develop and run. This is more than enough reason for it to be in the international spotlight, and for some to start questioning the original high price tag that was put on it, sending the entire AI stock market into turmoil, and NVIDIA's share price in particular plummeting.

Note: DeepSeek is not yet available, so there is no DeepSeek stock or DeepSeek price on the market yet.

Part 5. How to run DeepSeek-R1 locally?

1. DeepSeek-R1 model
DeepSeek-R1's local operation is consistent with DeepSeek-V3, and supports a variety of hardware conditions and open-source community software, you can check the
DeepSeek Paper for more specific configuration code and steps.

DeepSeek-V3 and DeepSeek-R1 are trained with FP8, if you need BF16 weights for your experiments, you can convert FP8 weights to BF16 first. Example as:

cd inference
python fp8_cast_bf16.py --input-fp8-hf-path /path/to/fp8_weights --output-bf16-hf-path /path/to/bf16_weights

Note: Transformers for Hugging Face are not directly supported.

2. DeepSeek-R1-Distill model
The DeepSeek-R1-Distill model can be used in the same way as the Qwen or Llama models.

Using vLLM:
vllm serve deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --tensor-parallel-size 2 --max-model-len 32768 --enforce-eager

Using SGLang:
python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-R1-Distill-Qwen-32B --trust-remote-code --tp 2

Some suggestions for local deployment of the DeepSeek-R1 model:

Set the temperature parameter to be between 0.5 and 0.7, so that the output is not an endless repetition or incoherent.
Avoid adding system prompts and include all instructions in the user prompts.
Test the model several times and set the average of the test results to evaluate the model's performance.
Have the model response with "think" at the beginning of each output.

Part 6. FAQs

Question 1. Is DeepSeek open source and free?

Yes, the DeepSeek model is currently open source and free to use. You can download the DeepSeek-R1 model from communities such as Hugging Face or Github.

Question 2. How to use DeepSeek-R1 within DeepSeek Chat?

First, go to the DeepSeek Chat, then enter your command in the input box and active the DeepSeek-R1 model (Deep Think R1), then press enter or click send to start a conversation with the DeepSeek chatbot.

Question 3. When was DeepSeek released?

The major DeepSeek model releases are listed below:

DeepSeek-V3 released on December 26, 2024, DeepSeek App released on January 15, 2025, and DeepSeek-R1 released on January 20, 2025.

Question 4. How many tokens does DeepSeek-R1 have?

DeepSeek-R1 has a total of 671B tokens . However, DeepSeek has also released six "lite" versions of R1 with parameter counts ranging from 1.5B to 70B:

DeepSeek-R1-Distill-Qwen-1.5B
DeepSeek-R1-Distill-Qwen-7B
DeepSeek-R1-Distill-Llama-8B
DeepSeek-R1-Distill-Qwen-14B
DeepSeek-R1-Distill-Qwen-32B
DeepSeek-R1-Distill-Llama-70B

Question 5. What are the hardware requirements to run DeepSeek-r1?

DeepSeek-R1 supports multiple languages and deployment options, including NVIDIA, AMD GPUs and Huawei Ascend NPUs.

Question 6. DeepSeek VS ChatGPT, is DeepSeek better?

DeepSeek's base model, R1, outperforms GPT-4o (with strong support from the free version of ChatGPT) in several industry benchmarks. It also has a fairly low running cost.

Part 7. Conclusion

This post gives you a detailed introduction to the meaning of DeepSeek-R1, how to configure them, the connection between each model and theirs evaluation results. As AI technology becomes more and more mature, we are gradually using AI in our daily life and work. Try the free DeepSeek Chat and experience the intelligent AI models to get cutting-edge AI solutions.

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?