いま注目している人工知能 A.I. 先端論文 15本 ~ 静止画・音声・動画(行動内容解析&予測)解析・ロボット制御・アルゴリズム自律学習

  • 128
    Like
  • 0
    Comment
More than 1 year has passed since last update.

頭の整理のため、人工知能 を 搭載した静止画・音声・動画(行動内容解析&予測)解析、ロボティクス制御 の 可能性 を 考える上で、いま注目している論文 を 列挙 します。

( 注目している 15本 の 論文 )


1.(ロボティクス制御)genetic-fuzzy, Genetic Fuzzy Tree ( GFT )「遺伝的ファジィ決定木」

2.(DNN構成 自律学習&自律更新)NeuroEvolution Evolutionary Artificial Neural Network, EANN

3.(工学数理モデル NeuralNet構成 自律学習)GMDH Group method of data handling

4. (ロボティクス制御) 深層カルマンフィルター ( Deep Kalman-Filter )

5. (ロボティクス制御)Self-Organizing Incremental Neural Network SOINN

6. (静止画、クラスタリング)Topological Data Analysis ( TDA )

7. (静止画)DCGAN

8. (音声)WaveNet

9. (動画)動画版GAN

10. (DNN 高速並列 勾配計算)Pixel Recurrent Neural Networks

11. (DNN 高速並列 勾配計算)Decoupled Neural Interfaces using Synthetic Gradients

12. (多次元立体 DNN)GRID LONG SHORT-TERM MEMORY

13. (半教師あり 深層学習)Ladder Network

14. (アルゴリズム自動学習)Neural Turing Machines

15. (アルゴリズム自動学習)Neural Random Access Machines


( 各アルゴリズム )

1. genetic-fuzzy, Genetic Fuzzy Tree ( GFT )「遺伝的ファジィ決定木」


2. NeuroEvolution Evolutionary Artificial Neural Network, EANN


3. GMDH Group method of data handling


4. 深層カルマンフィルター ( Deep Kalman-Filter )


5. Self-Organizing Incremental Neural Network SOINN


6. Topological Data Analysis ( TDA )


7. (静止画)DCGAN


8. (音声)WaveNet


9. (動画)動画版GAN

Abstract

We capitalize on large amounts of unlabeled video in order to learn a model of scene dynamics for both video recognition tasks (e.g. action classification) and video generation tasks (e.g. future prediction).

We propose a generative adversarial network for video with a spatio-temporal convolutional architecture that untangles the scene's foreground from the background.

Experiments suggest this model can generate tiny videos up to a second at full frame rate better than simple baselines, and we show its utility at predicting plausible futures of static images.

Moreover, experiments and visualizations show the model internally learns useful features for recognizing actions with minimal supervision, suggesting scene dynamics are a promising signal for representation learning.

We believe generative video models can impact many applications in video understanding and simulation.


10. (DNN 高速並列 勾配計算)Pixel Recurrent Neural Networks


11. (DNN 高速並列 勾配計算)Decoupled Neural Interfaces using Synthetic Gradients

Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates.
All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated.

_In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. _

These models predict what the result of the modelled subgraph will produce using only local information.

In particular we focus on modelling error gradients: by using the modelled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously i.e. we realise decoupled neural interfaces.

We show results for feed-forward models, where every layer is trained asynchronously, recurrent neural networks (RNNs) where predicting one's future gradient extends the time over which the RNN can effectively model, and also a hierarchical RNN system with ticking at different timescales. Finally, we demonstrate that in addition to predicting gradients, the same framework can be used to predict inputs, resulting in models which are decoupled in both the forward and backwards pass -- amounting to independent networks which co-learn such that they can be composed into a single functioning corporation.


12. GRID LONG SHORT-TERM MEMORY (多次元立体 深層ニューラルネットワーク構成 )

ABSTRACT

This paper introduces Grid Long Short-Term Memory, a network of LSTM cells arranged in a multidimensional grid that can be applied to vectors, sequences or higher dimensional data such as images.

The network differs from existing deep LSTM architectures in that the cells are connected between network layers as well as along the spatiotemporal dimensions of the data.

The network provides a unified way of using LSTM for both deep and sequential computation.

We apply the model to algorithmic tasks such as 15-digit integer addition and sequence memorization, where it is able to significantly outperform the standard LSTM.

We then give results for two empirical tasks.

We find that 2D Grid LSTM achieves 1.47 bits per character on the Wikipedia character prediction benchmark, which is state-of-the-art among neural approaches.

In addition, we use the Grid LSTM to define a novel two-dimensional translation model, the Reencoder, and show that it outperforms a phrase-based reference system on a Chinese-to-English translation task.


13. Ladder Network(半教師あり 深層学習)


14. (アルゴリズム自動学習)Neural Turing Machines

HirofumiYashima Qiita記事 (2016/09/20)「Neural Turing Machines と Neural Random Access Machines」


15. (アルゴリズム自動学習)Neural Random Access Machines

HirofumiYashima Qiita記事 (2016/09/20)「Neural Turing Machines と Neural Random Access Machines」