0

More than 5 years have passed since last update.

@hidemotoNakada(HIDEMOTO NAKADA)

Video Representation Learning by Dense Predictive Coding

ニューラルネットワーク

Posted at 2020-05-11

自己教師あり学習でビデオから時空間埋め込みを学習。人間の動作認識
Dense Predictive Coding という手法を提案
recurrently predicting future representation
カリキュラムトレーニング
Kinetics 400 で事前学習してaction recognitionで評価。HMDB51で35.7%

ネットワーク

2.5秒分学習して1.5秒分の埋め込みを予測。
fでxを隠れ変数z_t に変換、その後集計関数gでc_tに変換。
z_t = f(x_t), c_t = g(z_1, z_2, .., z_t)
予測関数Φでc_tからz^_t+1を予測

Contrastive loss

NCE Noise Contrastive Estimation
z^とｚをdot product で比較。コサイン距離？
3つのnegative pairs

Easy Negative - 別のビデオから
Spatial Negative - 同じビデオの空間的に異なる場所
Temporal Negative - 同じビデオの同じ場所の、時間的に異なる場所

カリキュラムトレーニング

予測するフレーム数をだんだん増やして遠くまで予測するようにする。

0

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

0