アイレット株式会社

DeepSeek-R1の蒸留モデルをColab上で動かしてみる

Last updated at 2025-01-29Posted at 2025-01-28

はじめに

DeepSeek-R1はオープンモデルでありながら、推論、数学、コーディングといったタスクにおいて、OpenAI-o1と同等の性能を達成したと言われています。

そのモデルの重みがHuggingFaceに公開されています。

671BのDeepSeek-R1は、Google ColabのGPUのメモリに乗らなそうでしたので、蒸留モデルDeepSeek-R1-Distill-Llama-8Bを使用して動かしてみました。

蒸留(Distillation)とは

蒸留とは既存の大きいモデルの出力を利用して新たな(小さい)モデルを学習することで、大きいモデルの精度に近しい(小さい)モデルを作成することです。

動作

Google Colabのランタイムからハードウェアアクセラレータを「T4 GPU」に変更します。
そして、モデルとトークナイザーをダウンロードします。

from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

model = AutoModelForCausalLM.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B", device_map="auto", torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-R1-Distill-Llama-8B")
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

ダウンロード完了まで数分待ちます。

そして推論を行います。
使用上の推奨事項として以下があります。

temperatureは0.5 ～ 0.7 (0.6 を推奨) の範囲に設定
システムプロンプトを追加しないで、すべての指示はユーザープロンプト内に含める必要がある

messages = [
    {"role": "user", "content": "日本語で回答してください。日本の首都は？"}
]

input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
output_ids = model.generate(input_ids,
                            max_new_tokens=4096,
                            temperature=0.7,
                            streamer=streamer)

出力結果

<think>
Okay, I need to figure out the capital of Japan. 
I remember that Japan is a country in Asia, but I'm not sure about the exact location of its capital. 
Let me think. 
I've heard of Tokyo before, so maybe that's the capital. 
Wait, I think I've heard it's a major city with lots of technology and stuff. 
But I'm not 100% sure. 
I should probably check some reliable sources to confirm. 
From what I recall, the government and the Emperor live in Tokyo, which makes it the political center. 
I think the answer is Tokyo, but I'll make sure to verify it just to be certain.
</think>

日本の首都は、東京都です。

<think>タグでDeepSeekの思考過程を出力してくれます。
その後、回答を出力します。

別の質問

「GeminiとDeepSeekの違いは？」と質問してみましたが、一部間違った回答でした。
蒸留モデルの限界なのかなと思いました。

日本語で質問したときの回答

<think>
まず、GeminiとDeepSeekの違いを明確する必要があります。Geminiは、OpenAIが開発した大型言語モデルで、通用性と創造性に優れ、多様なタスクに適応可能です。一方、DeepSeekは、深度求理（DeepSeek）が開発した通用的大型言語模型で、情報検索や質問回答、テキスト生成に特化しています。

Geminiは、さまざまな分野に応用可能な柔軟性があり、ユーザーがさまざまな質問やタスクに対応するために利用することが期待されます。DeepSeekは、特に情報検索やテキスト生成に重点があり、高速で精确な回答を提供することが特徴です。

両者は、基本的な言語モデルとしての共通点がありますが、Geminiは創造性や一般性に強く、DeepSeekは情報検索や高速性に重点があります。今後、どちらも進化し、さらなる特徴を発揮することが予想されます。
</think>

GeminiとDeepSeekの違いは、主に以下の点です：

1. **開発会社**：
   - Geminiは、OpenAIによって開発されました。
   - DeepSeekは、中国の深度求理（DeepSeek）が開発しています。

2. **用途**：
   - Geminiは、一般的な質問回答、テキスト生成、会話生成、etc.に適応性があり、多様なタスクに対応できます。
   - DeepSeekは、特に情報検索、テキスト生成、質問回答、etc.に特化していて、高速で精确な回答を提供することが期待されます。

3. **特徴**：
   - Geminiは、創造性や一般性に優れ、ユーザーがさまざまなタスクに適応することが期待されます。
   - DeepSeekは、高速性と精确性に特化しており、特に大規模なデータを処理する際に強みです。

4. **進化と未来**：
   - 両者は、進化を続けており、今後さらなる性能や 機能を向上させる予定です。

GeminiとDeepSeekは、共に大型言語モデルとしての基本的な機能を共有していますが、 respectiveの強みやすみやすみの特徴があります。

英語で質問したときの回答

<think>

</think>

Gemini and DeepSeek are both advanced AI systems developed by different companies, but they have distinct features and capabilities. Here's a comparison:

### 1. **Development Background**
   - **Gemini**: Developed by OpenAI, a renowned AI research laboratory.
   - **DeepSeek**: Developed by DeepSeek, a Chinese AI company focused on AGI research.

### 2. **Core Technology**
   - **Gemini**: Built on GPT-4, OpenAI's groundbreaking language model.
   - **DeepSeek**: Utilizes DeepSeek LLM, its own large language model with specialized algorithms for reasoning and optimization.

### 3. **Applications**
   - **Gemini**: General-purpose AI for various tasks like writing, math, coding, and data analysis.
   - **DeepSeek**: Designed for more specialized tasks such as mathematical problem-solving, programming, and high-level reasoning.

### 4. **Performance**
   - **Gemini**: Excels in creative writing, natural language processing, and everyday problem-solving.
   - **DeepSeek**: Stronger in technical domains like mathematics, coding, and logical reasoning.

### 5. **Access**
   - **Gemini**: Available via OpenAI's API for developers and users.
   - **DeepSeek**: Primarily available through DeepSeek's own platforms, with potential API access for certain use cases.

### 6. **Use Cases**
   - **Gemini**: Ideal for content creation, education, and general assistance.
   - **DeepSeek**: Suitable for technical roles, research, and specialized problem-solving.

### 7. **Development**
   - **Gemini**: Continuously updated by OpenAI with new features and improvements.
   - **DeepSeek**: Also under active development, with a focus on AGI and specialized AI models.

### 8. **Cost and Availability**
   - **Gemini**: Generally accessible via OpenAI's pricing model.
   - **DeepSeek**: Pricing may vary depending on usage and access level, with potential limitations on commercial use.

### 9. **Community and Support**
   - **Gemini**: Backed by OpenAI's extensive support and community.
   - **DeepSeek**: Supported by its own community and development team.

### 10. **Future Prospects**
   - Both Gemini and DeepSeek are expected to evolve rapidly, with OpenAI and DeepSeek pushing the boundaries of AI technology.

### Conclusion:
- **Gemini** is a versatile AI suitable for a wide range of tasks.
- **DeepSeek** is tailored for specialized domains, particularly technical and logical reasoning.

Choose the one that best fits your needs, whether it's for creative writing, technical problem-solving, or everyday assistance.

まとめ

DeepSeekの蒸留モデルを動かしてみました。
簡単な質問には正しい回答を返してくれますが、少し複雑な質問には間違った回答となりました。
これは蒸留モデルのため精度が劣っているのだと思います。
ただ、OpenAI-o1と同等の精度をローカルで動かせるようになるのは夢があります。
DeepSeek-R1を動かせるくらいのGPUで試してみたい。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up