RL in ICML2017 http://sotetsuk.hatenablog.com/entry/icml2017rl
model based deep reinforcement learning
tutorial DRL
prediction: end-to-end learning and planning
prediction and control with temporal segment models
end-to-end differentiable adversarial imitation learning
combining model-based and model-free updates for trajectory-centric RL
model-free deep reinforcement learning: DQN, A3C
Softmax optimisation
RL with deep energy-based policies
an alternative softmax operator for RL
Bridging the gap between value and policy based RL
equivalence between policy gradients and soft q-learning
a laplacian framework for option discovery in RL
Feudal networks for hierarchical RL
Unifying Task specification in RL
curiosity-driven exploration by self-supervised prediction
count-based exploration with neural density models
end-to-end learning
end-to-end differentiable adversarial imitation learning
the prediction: end-to-end learning and planning
feudal networks for hierarchical RL
zero-shot task generalization with multi-task DRL
Transfer/Zero-shot learning
robust adversarial RL
DARLA: improving zero-shot Transfer in RL
zero-shot task generalization with multi-task DRL
practical exploration
fairness in RL
constrained policy optimization
Heuristic Teaching
interactive learning from policy-dependent human feedback
modular multitask RL with policy sketches
zero-shot task generalization with multitask DRL
Bias and variance of Policy off learning
Data-efficient policy evaluation through behavior policy search
stochastic variance reduction methods for policy evaluation
optimal and adaptive off-policy evaluation in contextual bandits
consistent on-line off-policy evaluation
Categorical DQN
a distributional perspective on reinforcement learning
beta policy
improving stochastic policy gradients in continuous control with DRL using the beta distribution
to find option by using PVF

DDPG(deep determinastic policy gradients)
efficient exploration https://arxiv.org/pdf/1507.00814v3.pdf , https://arxiv.org/pdf/1606.01868v1.pdf
life-long learning: hierarchical architecture for the option framework(options = a sequence of actions with clear initiation and termination)
bayesian RL: http://www.cse.chalmers.se/~chrdimi/downloads/book.pdf
inverse RL
predictive state RL
markov decision process
planning by dynamic programming
model-free prediction
model free control
value function approximation
policy gradient methods
safe RL http://www.jmlr.org/papers/volume16/garcia15a/garcia15a.pdf
Topics of Active Research Topics of Active Research in Reinforcement Learning in Reinforcement Learning Relevant to Spoken Relevant to Spoken Dialogue Systems https://cs.uwaterloo.ca/~ppoupart/publications/presentations/aaai-dialogue-workshop-rl.pdf
open AI blog: https://blog.openai.com/
open AI GitHub Repo: https://github.com/openai/baselines
RL basic theory: http://www.wildml.com/2016/10/learning-reinforcement-learning/
RL training: https://github.com/dennybritz/reinforcement-learning
RL research papers: https://medium.com/@joshdotai/deep-reinforcement-learning-papers-a2167c136fc7
DRL: research papers: https://github.com/junhyukoh/deep-reinforcement-learning-papers
deep mind publications: https://deepmind.com/research/publications/

Sign up for free and join this conversation.
Sign Up
If you already have a Qiita account log in.