More than 3 years have passed since last update.

週刊Transformer（画像認識向け）2号

Posted at 2021-06-17

凄まじい勢いで増殖中のTransformer論文＋αにとりあえず、目を通しつつコメントを残していく。内容があっている保証はない。

The Modern Mathematics of Deep Learning
Deep learningのモヤモヤしたところを数学的に体系立ててみた質問集。

S2-MLP: Spatial-Shift MLP Architecture for Vision
Baidu論文。MLP-mixerを良くしたよ。channel mixingのみですごくシンプル。

Batch Normalization Orthogonalizes Representations in Deep Random Networks
BNはRepresentationをOrthogonalizeすることでうまく行っているのではないか？　OrthogonalityによりSGDの精度は変わる。初期にOrthogonalizeさせておけばBNなくてもSGDを速く収束させられる。

Rethinking “Batch” in BatchNorm
FAIR論文。BNのBatchのDistributionをきちんと調べてみた。

CoAtNet: Marrying Convolution and Attention for All Data Sizes
Google論文。ConvとAttentionのHybridネットワークCoAtNet（コートネットと呼ぶ）を作成。ImageNetで、他のデータなし86.9% top-1、もっとデータを足すと89.8%でSOTA。

Does Knowledge Distillation Really Work?
GoogleとNYU論文。Distillation思いの他うまくいかないんだけど。

Knowledge distillation: A good teacher is patient and consistent
Google論文。Distillationを細かく調べた。ResNet50が更に良くなる。

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
DNNみたいなOver parameterizedなモデルは、PL conditionを満たせ（Wideなモデルなら）ばSGDで解があり、速く収束する。

DETReg: Unsupervised Pretraining with Region Priors for Object Detection
self-supervised pretraining + DETR

Universal Adder Neural Networks
CNNを全てAdderに変更して、計算を簡単にする。

You Only Look at One Sequence: Rethinking Transformer in Vision through Object Detection
YOLO+ViT。DETRに似てるけどちょっと違う。小さいモデルだとDETRより良い。

BEIT: BERT Pre-Training of Image Transformers
Microsoft論文。Self-supervisedの手法で、ViTを良くする。

Vision Transformers with Patch Diversification
ViTの学習が不安定なのをモデルを変えず、Loss関数を変えて解消。

Chasing Sparsity in Vision Transformers: An End-to-End Exploration
ViTのSparsityを使った圧縮。

CAT: Cross Attention in Vision Transformer
ViTでLocalityとPyramid構造を使った圧縮。

Scaling Vision with Sparse Mixture of Experts
Google論文。ViTでSparsityを使って、必要のないTokenの計算をしないで圧縮。

Fully Transformer Networks for Semantic Image Segmentation
Semantic segmentationをTransformerのみ（Conv層を混ぜない）で行う。Pyramid構造ありのEncoder-Decoder構造。

Rethinking Channel Dimensions for Efficient Model Design
チャンネル方向にネットワーク圧縮するための新しいNAS考えたよ。

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations
データが偏っている場合のcontrastive self-supervised modelについて。

A Survey of Transformers
TransformerのSurvey。Transformer圧縮向けx-former系の分類。

Beyond BatchNorm: Towards a General Understanding of Normalization in Deep Learning
いろんなNormを比較。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up