More than 3 years have passed since last update.

【StereoDepth】GC-Net: End-to-EndでStereo Matching

Last updated at 2020-10-25Posted at 2020-10-16

End-to-EndのStereo Depth Estimationの基本となるモデル
Stereo DepthのDeep Learningを学びたいって人はまずこの論文を読むのがオススメ

・Feature Extraction
・Cost Volume
・Learning Context
・Soft ArgMin

Feature Extraction

右と左のFeature Mapを抽出する為に2D Convolutionを行う。この時Shared-weightにすることで右と左の画像で同じ特徴を捉えられるので、LeftとrightのSimilarityを計算するのに役立つ。

Cost Volume

入力　　　　　　　　　　　　　出力 [Width,Height,Channel]=>[Width,Height,Disparity+1,Channel]

Cost Volumeは入力のFeature Mapを0~MaxDisparity(任意の値)まで1pixelずつしていくだけ。

Learning Context

Cost Volumeを作った時点で右と左の特徴量が近いものを計算すればDisparityは出力出来るのだが、もっと精度をあげたい！ LocalのコンテキストのMatchingだけじゃなくて画像全体(Global)のContextも学習する為に3D Convolutionを行う。

要するにRefinementをするネットワーク

Soft ArgMin

得られた[Width,Height,Disparity]から[Width,Height]のDisparity Mapを得るため、Matching Costの一番高いDisparity Layerを出力するのではなく、重み付けをして出力する事でsub pixel accuracyでDisparityを求める事が出来る。 ![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/6dc9b1ed-1bbf-9f8f-79d3-85912e9e5b1c.png)

結論

とてもシンプルで分かりやすい。今後はこの構造を改良したモデルについて見ていきたい！

参考文献

End-to-End Learning of Geometry and Context for Deep Stereo Regression https://arxiv.org/pdf/1703.04309.pdf

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up