More than 3 years have passed since last update.

【semantic segmentation】SegNet : Pooling Indicesでメモリの効率化

Posted at 2020-11-13

semantic segmentationで有名なSegNet。ネットワーク見る感じシンプルだけど何が他と違うのかまとめてみた。

もうちょい実験したらgithubにも乗っけようと思う
https://github.com/yokosyun/SegNet

ネットワーク

Encoder-DecoderだけのシンプルなネットワークでDeconvolutionを使わなくてもUnpooling時にPooling Indicesを使えばそこそこ精度も出て、メモリに優しいよっといった設計。

Max-pooling indies vs sum

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/f3a1dfa1-1ad9-7c5f-5eda-344c1bfabdd1.png)

左側がSegNetで提案するMax-Pooling Indicesを使ったunpooling。Max Poolingする時に最大値の位置をIndicesとして記憶することでunpooling時に元の位置に特徴量を戻せる。

右側がFCNで2倍にUpsamplingした結果とEncoderの特徴量を足す事で全体感を掴みつつ、細かい情報にアクセス出来るようになっている。

比較

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/56579c35-58b8-74b6-7b36-a2c34ca0ed72.png) 一つずつ比較を解説していこうと思う

SegNet-Basic vs SegNet-Basic-EncoderAddition

encoderの結果をdecoderに足すと精度が上がるが、メモリの使用量もあがる。

SegNet-Basic vs SegNet-Basic-SingleChannelDecoder

Decorderのfilter数を1にする事でPaeameterの数が減り、計算速度も早くなるが、精度が落ちる

FCN-Basic vs FCN-Basic-NoAddition

DecoderにEncoderの結果をたさないと精度が落ちるが、使用するメモリ量が減る。

FCN-Basic vs FCN-Basic-NoDimReduction

channel数を小さくせず、大きなネットワークを使う事で精度はあがるが、parameter数、メモリー使用量、計算時間が上昇する

FCN-Basic-NoDimReduction vs FCN-Basic-NoAddition-NoDimReduction

DecoderにEncoderの結果をたさないと精度が落ちるが、使用するメモリ量が減る。

結論

SegNet-Basicはメモリの使用量を小さく出来て、さらに結果はFCN-Basicと同じくらいの精度が出るので、Pooling Indiceiesという方法はそこそこ効率的だと言える。

もちろん精度を上げたいのであれば、DecoderにEncoderの結果を足し合わせたりするのもありだけどメモリの使用量増えますよっていう論文でした。

参考文献

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation https://arxiv.org/pdf/1511.00561.pdf Fully Convolutional Networks for Semantic Segmentation https://arxiv.org/pdf/1605.06211.pdf

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up