0

ICLR2025の量子化論文 (2)

Posted at 2025-07-13

概要

この記事では、ICLR2025の量子化論文を紹介します。

QERA: an Analytical Framework for Quantization Error Reconstruction

概要：低ランク近似で量子化誤差最小化する方法で、重みの量子化誤差とactivationの量子化誤差を最小化する方法
キモ：Theorem 1と、この近似解Theorem 2

Effective Interplay between Sparsity and Quantization: From Theory to Practice

概要：スパース化と量子化は直交関係ではなく、互いに影響を及ぼすことを数学的に示した

OstQuant: Refining Large Language Model Quantization with Orthogonal and Scaling Transformations for Better Distribution Fitting

概要：直交行列・スケーリングによる変換によるPTQの性能改善
キモ：Fig. 5の位置に直交行列・スケーリング変換を挿入。パラメータは学習

MambaQuant: Quantizing the Mamba Family with Variance Aligned Rotation Methods

概要：Mambaの初めてのPTQ量子化。行列積外れ値を解決
キモ：(1) アダマール積にKL変換を組合せた変換でチャンネル間の分散を同じにし量子化しやすくする (2) Smoothingベースの変換

Quamba: A Post-Training Quantization Recipe for Selective State Space Models

概要：State Space Model (SSM) の量子化
キモ：外れ値が大きい課題をHadamard変換で対策する

QP-SNNs: Quantized And Pruned Spiking Neural Networks

概要：Spiking Neural Network (SNN) の量子化
キモ：(9)式のように重みをRescaleすることでFig. 2のように量子化レンジを有効利用する

SpinQuant: LLM quantization with learned rotations

概要：回転行列によってactivationの外れ値を抑制
キモ：Fig. 1の位置に回転行列を置く。回転行列は最適化で決める

Efficient Low-Bit Quantization with Adaptive Scales for Multi-Task Co-Training

概要：Co-training(入力に対し1タスクを処理する多目的のモデルを学習)に対してQATを適用する方法
キモ：Activationの量子化をタスク毎に切り替える(Fig. 4)

0

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

0