More than 3 years have passed since last update.

【Stereo Depth】Dispnet : concatenationでStereo Matching

Last updated at 2020-12-09Posted at 2020-12-08

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/700c188c-752d-15f8-3419-78b0fa1edb80.png)

DispNetはMC-CNNやSGM(Semi Global Matching)と違う方法でDepthを計算していたので、面白いと思い記事にします。
*optical Flowについても記述があるのですが今回は無視します

新規性

左右の画像を結合して入力

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/a387c92d-e353-c0f0-f36e-853639815eea.png) 入力の時点で特徴量6、すなわち右と左のRGBが入力とい事がわかります。 MC-CNNではPatch(9x9のエリアをくり抜いた物)を入力して、CNNで畳込むのですが、すべてのDisparityに対して同じ処理を何回も繰り返さなきゃいけなく、かなり計算コストが重めだった。

しかもこっちの方が画像全体を参照出来るから良いんじゃないかと思った。

Correlation Layer

DispNetとDispNetCorr1Dというのが下の結果のTableにあるのですが、Correlationとは取り得るDisparityに対して右と左のFeature MapをシフトしてCost Volumeを作る事を言います。
*Stereo Matchingの取りうるすべてのDisparityに対して特徴を計算するのに似ています

結果

![image.png](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/482094/9d28ef6b-ffdd-3700-cb16-7f92d18591f0.png) 右と左の画像を結合する方法は古典的なSGMやCNNを使ったMC-CNNより高速でした。

結論

・2020年現在あまり使われていない右と左の画像を結合して入力とするこの手法は以外と注目されていないだけで、効率的だと思っている。

参考文献

A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation https://arxiv.org/pdf/1512.02134.pdf

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up