More than 3 years have passed since last update.

【semantic segmentation】PSPNet : Pyramid Pooling最強説

Last updated at 2020-11-16Posted at 2020-11-16

概要

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/176d3ce5-8e3a-cd40-efd4-526f1702f343.png)

とてもシンプルなネットワークだがなかなかの精度が出るPSPNet

a)画像を入力 b)ResNetで特徴を取り出す c)Pyramid Poolingで様々なスケールの特徴を学習し、画像サイズを合わせて結合する。d)1x1 convolutionで出力chanel数をclass数と一致させるsemantic mapを出力する。

githubにコード上げる予定です
https://github.com/yokosyun/SegNet

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/c6acb564-1e97-7e37-e528-b9c62bf7e735.png)

(b)Feature Mapを生成する為のResNetの途中の層でもclassificationを行い、d)と同じくlossを計算する。深いネットワークを使う時に消失勾配を減少する事が出来そう。

入力 ![input_tensor.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/ac909d9d-6bd4-54df-d716-07e726c27601.png)

ground truth

出力

batch size8で23epochしかtrainingしてないのに良い感じ！
resnet50のpretrained weightを使ったからかな

Pyramid Poolingするとやっぱり精度は上がるね

Pyramid Scene Parsing Network https://arxiv.org/pdf/1612.01105.pdf