More than 1 year has passed since last update.

Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy, Hongzhi Hua, Kaigui Wu and Guixuan Wen

Last updated at 2022-06-13Posted at 2022-06-13

深層学習の論文について解説してみた

参加記事です。まず、抄録を仮訳し、辞書を作り、参考文献一覧を確認しています。

期間の最後には、解説になる予定です。今、しららくお待ちください。

論文をどうやって選んだかというと、
https://arxiv.org
で
Deep Learningで検索した文献を順に見て、最初のn本中、自分が一番得意なものに絞りました。
読者のことは一切考えていません。っていうか、読者のことが考えられるほと、Deep Learningそのものの専門家ではありません。どれくらい専門家でないかというと、「Deep Learning」という単語が入っていないと、Deep Learningの論文かどうか判定できないくらい、Deep Learningは詳しくありません。

参考文献の自己参照に列記した記録のように、深層学習の勉強会で、ソフトウェアの導入、コマンドのエラー対応、pythonのエラー修正などのお手伝いをしました。

連続制御と離散制御および連続通信および離散通信の専門家です。エントロピー学会の世話人をしていたことがあります。期待できません。深層学習の対象側の専門家とご理解ください。

Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum Entropy, Hongzhi Hua, Kaigui Wu and Guixuan Wen

Abstract

Multi-agent deep reinforcement learning has been applied to address
a variety of complex problems with either discrete or continuous
action spaces and achieved great success. However, most real-world
environments cannot be described by only discrete action spaces or
only continuous action spaces. And there are few works having ever
utilized deep reinforcement learning (drl) to multi-agent problems
with hybrid action spaces. Therefore, we propose a novel algorithm:
Deep Multi-Agent Hybrid Soft Actor-Critic (MAHSAC) to fill this
gap. This algorithm follows the centralized training but decentralized execution (CTDE) paradigm, and extend the Soft Actor-Critic algorithm (SAC) to handle hybrid action space problems in MultiAgent environments based on maximum entropy. Our experiences are running on an easy multi-agent particle world with a continuous observation and discrete action space, along with some basic simulated physics. The experimental results show that MAHSAC has good performance in training speed, stability, and anti-interference ability.
1 Springer Nature 2021 LATEX template
2 Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on MaximAt the same time, it outperforms existing independent deep hybrid
learning method in cooperative scenarios and competitive scenarios.
Keywords: multi-agent, deep reinforcement learning, hybrid action spaces,
maximum entropy

仮約
複数代理人深層強化学習は、離散または連続のいずれかの行動空間でさまざまな複雑な問題に適用し、で大成功を収めた。
ただし、ほとんどの実際の環境は、個別の行動空間または連続的な行動空間だけで説明することはできない。そして、混合行動空間での複数代理人問題に深層強化学習（drl）を利用したことのある作品はほとんどない。新しい算法を提案する。
この隙間を埋めるための深層複数代理人混合Soft行動者臨界（MAHSAC）。この算法は、集中型訓練であるが分散型実行（CTDE）規範に従い、最大エントロピーに基づいて複数代理人環境で混合行動空間の問題を処理するためにSoft Actor-Critic算法（SAC）を拡張します。私たちの経験は、いくつかの基本的な模擬試験物理学とともに、連続観測と離散行動空間を備えた簡単な複数代理人粒子の世界で実行する。実験結果は、MAHSACがト訓練速度、安定性、および干渉防止能力において優れたを実績発揮する。
1 Springer Nature2021LATEX雛形
2 Maximに基づく混合行動空間を使用した深層複数代理人強化学習同時に、協調台本および競合台本では、既存の独立した深層混合学習方法よりも優れています。
鍵語：複数代理人、深層強化学習、混合行動空間、最大エントロピー(entropy)

References

[1] Kai Arulkumaran, Marc Peter Deisenroth, Miles Brundage, Anil Anthony
Bharath. (2017). Deep Reinforcement Learning: A Brief Survey. IEEE Signal Process. Mag. 34(6): 26-38. https://doi.org/10.1109/MSP.2017.2743240
[2] Seyed Sajad Mousavi, Michael Schukat, Enda Howley. (2016). Deep
Reinforcement Learning: An Overview. IntelliSys (2) : 426-440.
https://doi.org/10.1007/978-3-319-56991-8 32
[3] Thanh Thi Nguyen, Ngoc Duy Nguyen, Saeid Nahavandi. (2018).
Deep Reinforcement Learning for Multi-Agent Systems: A Review
of Challenges, Solutions and Applications. CoRR abs/1812.11794.
https://doi.org/10.1109/TCYB.2020.2977374
[4] Ardi Tampuu, Tambet Matiisen, Dorian Kodelja, Ilya Kuzovkin, Kristjan Korjus, Juhan Aru, Jaan Aru, Raul Vicente. (2015). Multiagent
Cooperation and Competition with Deep Reinforcement Learning. CoRR
abs/1511.08779. https://doi.org/10.48550/arXiv.1511.08779
[5] Ryan Lowe, Yi Wu, Aviv Tamar, Jean Harb, Pieter Abbeel, Igor Mordatch.
(2017). Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. NIPS : 6379-6390. https://doi.org/10.48550/arXiv.1706.02275
[6] Tabish Rashid, Mikayel Samvelyan, Christian Schr¨oder de Witt, Gregory
Farquhar, Jakob N. Foerster, Shimon Whiteson. (2018). QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement
Learning. ICML : 4292-4301. https://doi.org/10.48550/arXiv.1803.11485
[7] Craig J. Bester, Steven D. James, George Dimitri Konidaris.
(2019). Multi-Pass Q-Networks for Deep Reinforcement Learning with Parameterised Action Spaces. CoRR abs/1905.04388. https://doi.org/10.48550/arXiv.1905.04388
[8] Matthew J. Hausknecht, Peter Stone. (2016). Deep Reinforcement Learning in Parameterized Action Space. ICLR (Poster). https://doi.org/10.48550/arXiv.1511.04143
[9] Tuomas Haarnoja, Aurick Zhou, Pieter Abbeel, Sergey Levine.
(2018). Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. ICML : 1856-1865.
https://doi.org/10.48550/arXiv.1801.01290, Springer Nature 2021 LATEX template
Deep Multi-Agent Reinforcement Learning with Hybrid Action Spaces based on Maximum E[10] Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker,
Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter
Abbeel, Sergey Levine. (2018). Soft Actor-Critic Algorithms and Applications. CoRR abs/1812.05905. https://doi.org/10.48550/arXiv.1812.05905
[11] Olivier Delalleau, Maxim Peter, Eloi Alonso, Adrien Logut.
(2019). Discrete and Continuous Action Representation for Practical RL in Video Games. CoRR abs/1912.11077. https://doi.org/10.48550/arXiv.1912.11077
[12] Petros Christodoulou. (2019). Soft Actor-Critic for Discrete Action Settings. CoRR abs/1910.07207. https://doi.org/10.48550/arXiv.1910.07207

単語帳

頻度	英語	日本語
215	the	その
99	and	および
93	to	に
88	of	の
80	a	一つの
76	agent	代理人
64	action	行動
61	in	中に
61	is	です
48	multi	複数の
44	with	と
37	learning	学ぶ
36	deep	深い
35	which	どの
34	hybrid	混成物
34	s	s
32	reinforcement	強化
31	discrete	離散
31	i	私
31	on	上に
30	as	なので
29	by	に
29	spaces	空間
28	at	で
28	continuous	連続
27	agents	代理人
25	can	できる
25	st	st
24	we	私たち
22	hsac	hsac
22	q	q
21	are	である
21	network	通信網
20	actions	行動
20	this	これ
19	based	基づく
19	for	為に
18	critic	評論家
18	each	各
18	mahsac	mahsac
18	that	それ
17	actor	俳優
17	be	なれ
16	training	訓練
16	value	価値
15	algorithm	算法
15	environment	環境
15	prey	獲物
15	space	空間
14	decentralized	分散型
13	latex	latex
13	nature	自然
13	soft	柔らかい
13	springer	springer
13	template	雛形
12	doi	土井
12	entropy	entropy
12	has	持つ
12	https	https
12	org	organization
12	problems	問題
12	set	設定
11	ad	広告
11	cooperative	協同組合
11	from	から
11	other	他の
11	target	目標
11	two	2
10	arxiv	arxiv
10	h	h
10	it	それ
10	maximum	最大
10	performance	実績
10	scenario	台本
10	there	そこの
9	ac	交流
9	algorithms	算法
9	an	一つの
9	d	d
9	policy	政策
9	reward	褒美
9	t	t
9	where	どこ
8	different	違う
8	environments	環境
8	fig	図
8	number	番号
8	only	それだけ
8	or	また
8	our	私たちの
8	sac	sac
8	state	状態
8	will	意思
7	both	両方
7	figure	図
7	its	これは
7	j	j
7	local	地域の
7	many	たくさんの
7	maxim	公理
7	method	方法
7	more	もっと
7	parameters	引数
7	predator	捕食者
7	predators	捕食者
7	strategy	戦略
7	therefore	したがって
7	time	時間
7	used	使った
6	abs	腹筋
6	all	全て
6	average	平均
6	centralized	一元化
6	chongqing	重慶
6	complex	繁雑
6	components	部品
6	corr	corr
6	distribution	分布
6	en	en
6	execution	実行
6	extend	拡張する
6	input	入力
6	maddpg	maddpg
6	paradigm	規範
6	particle	粒子
6	random	無作為
6	represents	を表す
6	rt	rt
6	take	取った
6	then	それから
6	when	いつ
6	world	世界
6	y	y
5	applied	適用
5	architecture	建築
5	competitive	競争力
5	global	広域の
5	good	良い
5	here	ここ
5	independent	独立
5	minimizing	最小化
5	oi	oi
5	output	出力
5	per	あたり
5	results	結果
5	same	同じ
5	shown	示す
5	shows	示す
5	standard	標準
5	these	これらの
5	they	それらの
5	while	間

参考文献

最近読んだDeepLearning系の論文メモ

【論文読み】Structured Training for Large-Vocabulary Chord Recognition

物体検出のDeepLearning読むべき論文7選とポイントまとめ【EfficientDetまでの道筋】

【論文まとめ】DropBlockについて

DeepLearning研究 2016年のまとめ

【論文メモ】DeepLearningを用いた超解像手法/DeepSRの論文まとめ

【論文メモ】DeepLearningを用いた超解像手法/RVSRの論文まとめ

【論文メモ】DeepLearningを用いた動画像超解像の調査論文のまとめ

【論文紹介】 FaceNet: 画像のベクトル化と「近さ」の定義

自己参照

dockerで機械学習(3) with anaconda(3)「直感Deep Learning」Antonio Gulli、Sujit Pal著

dockerで機械学習(2)with anaconda(2)「ゼロから作るDeep Learning2自然言語処理編」斎藤康毅著

「ゼロから作るDeep Learning2自然言語処理編」斎藤康毅著　第8章　作業報告

直感Deep Learning, dockerに挑戦(1) pull 書籍通り

ゼロから作るDeep Learning, 斎藤康毅

「ゼロから作るDeep Learning」参考文献一覧

dockerで機械学習(1) with anaconda(1)「ゼロから作るDeep Learning - Pythonで学ぶディープラーニングの理論と実装」斎藤康毅著

「ゼロから作るDeep Learning ２自然言語処理編」読書会に参加する前に読んで置くとよい資料とプログラム

report: chapter 7 on from scratch deep learning 2 natural language

なぜdockerでpython/Rを使って機械学習するか書籍・ソース一覧作成中 (目標100) docker(18)

ゼロから作るDeepLearning２自然言語処理編　読書会の進め方（例）

深層学習・機械学習　出願特許

dockerで機械学習(12) with anaconda(12)「Deep Learning 深層学習」Ian Goodfellow, Yoshua Bengio, Aaron Courville 著

dockerで機械学習(15) with anaconda(15)「Deep Learning Cookbook Practical Recipes to Get Started Quickly」Douwe Osinga 著

dockerで機械学習(18) with anaconda(18)「Deep Learning with Keras」 Antonio Gulli, Sujit Pal著

dockerで機械学習(16) with anaconda(16)「Deep Learning Essentials」 Wei Di, Anurag Bhardwaj, Jianing Wei 著

dockerで機械学習(19) with anaconda(19)「Deep Learning Quick Reference」 Mike Bernico著

dockerで機械学習(22) with anaconda(22)「Deep Learning for Computer Vision」 By Rajalingappaa Shanmugamani

dockerで機械学習(23) with anaconda(23)「Deep Learning with PyTorch」By Vishnu Subramanian

dockerで機械学習(29) with anaconda(29)「Python Deep Learning」 By Valentino Zocca, Gianmario Spacagna, Daniel Slater, Peter Roelants

dockerで機械学習(30) with anaconda(30)「Advanced Deep Learning with Keras」 By Philippe Remy

dockerで機械学習(31) with anaconda(31)「Fundamentals of Deep Learning」 By Nikhil Buduma

dockerで機械学習(32) with anaconda(32)「Hands-On Deep Learning with TensorFlow 」By Dan Van Boxel

dockerで機械学習(33) with anaconda(33)「Deep Learning with Theano 」By Christopher Bourez

dockerで機械学習(34) with anaconda(34)「Python Deep Learning Cookbook」 By Indra Bakker

dockerで機械学習(42) Programming PyTorch for Deep Learning By Ian Pointer

「R Deep Learning Cookbook」 By Philippe Remy 。dockerで機械学習(75) with R(5)

dockerで機械学習(86) with Spark(1)「Machine Learning with Spark」 By Rajdeep Dua, Manpreet Singh Ghotra, Nick Pentreath

dockerで機械学習(93) with Hadoop(1)「Deep Learning with Hadoop」By Dipayan Dev

Data ScientistのためのPython, R, Machine Learning/Deep Learning環境構築。docker(20)

機械学習・深層学習でできること、できないこと。仮説１０個。仮説（36）

Deep Learning for ENGINEER

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up