0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

NeurIPS2024の量子化論文 (3)

Posted at

概要

この記事では、NeurIPS2024の量子化論文を紹介します。1

その1 その2

KV Cache is 1 Bit Per Channel: Efficient Large Language Model Inference with Coupled Quantization

  • 概要:KV Cacheを量子化する時に、Per-channel量子化するのではなく、複数のChannelをまとめて量子化することで圧縮効率を高めて量子化する方法 (Fig. 3)

Towards Next-Level Post-Training Quantization of Hyper-Scale Transformers

  • 概要:aespa. PTQで量子化誤差最小化する時に、重み全体を同時に最適化するのではなく、Q,K,Vを別々に最適化することによって計算効率化

BiDM: Pushing the Limit of Quantization for Diffusion Models

  • 概要:BiDM. 拡散モデル向けに工夫して1bit量子化
  1. 画像や数式は論文から引用しています。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?