0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

Nemotron 3 Super: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

0
Posted at

https://www.alphaxiv.org/overview/2604.12374 を読んだメモです。

書誌情報

研究機関:NVIDIA

  • Nemotron 3 SuperはNVFP4での低精度学習をしている
  • Mamba2, LatentMoE, Attentionのハイブリッドアーキテクチャである
  • LatentMoEは、MoEの前に潜在空間に射影することで計算量を落とす
  • マルチトークン予測(MTP)は投機的デコーディングをネイティブでサポートする
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?