More than 3 years have passed since last update.

【Stereo Depth】GC-Net : Cost VolumeでConcatenationを使った最初のネットワーク?!

Last updated at 2020-12-09Posted at 2020-12-08

End-to-End Learning of Geometry and Context for Deep Stereo Regression

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/48ee9371-08f4-bccf-d9a9-d7bc92eb9271.png) Cost VolumeでConcatenationを使った形のモデルは現在主流となっていて、恐らくこの論文が最初に提案したのではないかなと。 *過去にはcorrelation(相関性)という方法でInnerProduct(内積)が良く使われていた。とりあえず、過去の方法より何が良いのか見ていきたいと思う

新規性

Cost Volume

図の2D Convlutionで抽出した特徴量Map(Channel,Height,Width)を0[pixel]~maxdisparity[pixel]まで1つずつずらして左右の画像を結合する事でCostVolume(Disparity,Channel,Height,Width)が作れる。

これはSGM(Semi Global Matching)等で行われてたPatchをスライドさせる作業の役割を果たす。

得たCost Volumeに対して3D CNNを行うことで、Disparity方向にも情報を畳み込む事が出来、Disparity のSmoothness等を加味してMatching Scoreを出力(Disparity,Height,Width)してくれる。

Soft ArgMax

Disparity方向に最大値を取って出力とするのではなく、Soft Maxをして合計が1になるようにして、それぞれのDiaparityに重みを掛けることで最終的な出力とする。そうすることで、sub-pixel accuracyを得れる。

しかし、最大値を取るより、逆に精度が悪くなることもあると思う。

結果

・Cost VolumeにConcatenationする方法を新たに提案した・シンプルで理解しやすく良いモデル

参考文献

End-to-End Learning of Geometry and Context for Deep Stereo Regression https://arxiv.org/pdf/1703.04309.pdf

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up