More than 3 years have passed since last update.

Peeking into the Future: Predicting Future Person Activities and Locations in Videos

Posted at 2020-05-26

Peeking into the Future: Predicting Future Person Activities and Locations in Videos

サーベイ3本目
論文リンクは以下.
https://arxiv.org/pdf/1902.03748v3.pdf
authorがCMU/Stanford/Google AI強そう

key

joint modeling of paths and activities benefits future path prediction

study the future path prediction jointly with the intention

our techniques focus on predicting a few seconds into the future,and should not be useful non-routine activities

概要

従来の動作認識(行動予測)では，人の意思を反映したものではない．
画像から特徴を抽出して推定するものが主．しかし，人の行動は，意思によって決まるものである．
本論文では，人の動きを動画内から観察される意図をmodelingする．

本論文では，future path/future activities を同時に推測する，multi-task learning modelとして，Nextを提案する．

従来手法との違い
・人に関する特徴をpointではなく，周りの物体/人とのinteractionや，body movementなどから得る．
・学習を促進するため，auxiliary task(activity location prediction etc)を導入

ネットワーク(called Next)

使用するネットワークは，上記．
従来手法では，人の特徴を空間上に点として簡略化して，抽出していたが，本手法では，それぞれの人の豊富な情報を抽出するために，人の情報をencodeするmoduleとして，
Person Behavior ModuleとPerson Interaction Moduleを使用．
これらによって，人の行動の習慣や，人と人/物の相互作用などの情報を抽出する．

また，抽出された情報から．LSTMを用いて，人の軌跡を推定するTrajectory Generatorと，行動ラベルを予測するActivity predictionを実装する．
Activity Predictionでは，画像を様々なgridに分割(Manhattan Grid)し，行動の起こる場所を推定するLocation predictionも行う．