More than 5 years have passed since last update.

Inpaintingからディープラーニング、最新のGAN事情について学べる本を書いた

Posted at 2020-03-02

3～4ヶ月かけてA4・195ページの薄くない薄い本を書きました。タイトルは『モザイク除去から学ぶ　最先端のディープラーニング』です。TensorFlow2.0全対応です。

Inpaintingとは

画像の一部を塗りつぶしてもっともらしく画像を復元するタスク。画像全体ではなく、白く塗りつぶした部分の生成を目標とします。

画像：https://github.com/JiahuiYu/generative_inpaintingより

Inpaintingの応用

上の図だけで気づいた人はかなりセンスがいいです。実はこのInpainting、モザイク除去としての応用が期待されています。応用例として最も有名なのがDeepCreamPyでしょう（GitHubのスター数はなんと9k！　参考：chainerのスター数が5.3k、PyTorchのスター数が36.5k）。

画像：DeepCreamPyのリポジトリより

このように緑色でマスクした領域の回復を試みるのがInpaintingのやっていることです。DeepCreamPyはすけべな二次元画像に特化したモデルを提供しています。

DeepCreamPyがどのようにモザイクを除去できるのかというと、モザイクのかかっている領域をマスクとみなしてInpaintingの問題を解きます。例えばいらすとやのアワビの画像を使いましょう。

このように艶のあるアワビが出てきました。このようにモザイク除去できるのがDeepCreamPyなんですね。

勘のいい方は「元画像のアワビと違うではないか」みたいな感想を抱いたかもしれません。これは重要なポイントで、本書では「Perception Distortion Tradeoff」の概念を使って掘り下げています。

GANの最新事情

実はInpaintingの最新研究は、かなりがGANベースのモデルになっています。なぜGANベースかというと、Non-GANよりも明らかに綺麗に出やすいからです。DFNetのようにNon-GANでもかなり綺麗に出るのはありますが、全体的な位置づけとしては少数派です。GANベースは、pix2pixやCycle GANの延長線上のようなものと思ってください。

モザイク除去に応用できるInpaintingの研究を知るにあたっては、かなり最新のGAN事情を知っておく必要があります。特にGANの安定性に関して画期的なブレイクスルーを果たした、Spectral Normalizationがよく使われています。また、Self Attention GANで用いられているようなAttention構造の発展型が目にすることがあります。特にInpaintingではContextual Attentionという、アバウトに言うとニューラルネットワークで画像のコピペを行うもの、もう少し正確にいうと、ある領域に合うようなパッチを探すPatch Matchをディープラーニングの関数に落とし込んだ構造が、1つの大きなマイルストーンとなっています。「ニューラルネットワークで画像のコピペとはどう再現するのでしょうか？」、これを本書では丁寧に解説しています。

GANの最新事情については本章の7章（後で目次を貼ります）で、「DCGAN, Hinge Loss, Spectral Normalization, pix2pix, CycleGAN」を基本例として紹介します。一般的な専門書だとここで終わっているものも多いですが、本書はここからスタートです。もちろん実装も紹介し、Google ColaboratoryのNotebookを使って各章に演習問題をおいています。7章だとpix2pixとCycleGANが演習問題になっています。

「ディープラーニングを使ったモザイク除去を例としながら、最新のGANやディープラーニングについて学ぼう」というのが本書の狙いです。詳細は書ききれないので本書を参照してください（A4で195ページもあるのでとてもQiitaには書ききれません……）。

序章 DeepCreamPyで遊ぼう
1章機械学習・ディープラーニング・畳み込みニューラルネットワーク
- ディープラーニングがなぜ従来の機械学習と違ってもてはやされるのか
- 畳み込みニューラルネットワークと画像処理の畳み込みの違いは何か、NNの関数で表現するにはどうするのか
- 画像処理の畳み込みはニューラルネットワークの関数ではDepthwise Conv
- TensorFlow2.0のTPUはどう使えばよいのか
2章超解像ベースのモザイク除去
- モザイク除去の最もシンプルなアプローチは超解像ベース
- モザイクと超解像の関係
- PSNRの評価指標
3章 U-Netによるモザイク除去
- Image to image translationのアプローチによるモザイク除去
- Squeeze and ExcitationやSENetの派生系による大域特徴を加味したCNN
4章モザイクの科学
- ガウシアンフィルタ、ラプラシアンフィルタやフーリエ変換による画像の周波数特性
- モザイクの周波数特性は何か
- Perception Distortion Tradeoffの考え方
5章 OPYデータセット
- 機械学習ではデータありきで語られることが多いが、データセットを1から作るにはどうすればよいのか
- データセットを作るメリットとは何か
- データクレンジングやリークに注意してデータを作るにはどうすればいいのか
6章 Partial Convolutions(ECCV 2018)によるモザイク除去
- P-Convがやっていることとは何か
- グラム行列と相関行列の関係性
7章 Generative Adversarial Network
- DCGAN, Hinge Loss, Spectral Normalizationの考え方
- Spectral NormalizationがGANの安定性に寄与する直感的な理解、リプシッツ定数のコントロールとは何か
- pix2pix、CycleGANの実装
8章 Gated Conv(ICCV 2019)によるモザイク除去
- Contextual Attentionの考え方
- ニューラルネットワークで画像のコピペとはどう再現するのか
- 画像をパッチ分割するのにはどうするのか
- Gated Convレイヤーの考え方
9章 PEPSI(CVPR 2019)によるモザイク除去
- Contextual Attentionの改良
- REDやDiet PEPSIの発想
10章 Edge Connect(ICCV 2019)によるモザイク除去
- お絵かきのアプローチをGANのモデルに導入する
- Self Attentionを導入する
- Edge Connectが使いやすい点とは何か
11章紹介できなかった論文・おわりに
- 主に2019年に発表された最新研究を簡単におさらい
- Inpainting多めだが、新しく出てきたOutpaintingについても紹介

Outpaintingについては面白いので今度別の記事に書いていきます。こんなパノラマ画像が作れるのがOutpaintingですね。

画像：https://github.com/z-x-yang/NS-Outpainting

参考文献リスト

本書の論文一覧です。おおよそのイメージ把握にご活用ください。

1章

A. L. Maas, A. Y. Hannun, A. Y. Ng. Rectifier Nonlinearities Improve Neural Network Acoustic Models. International Conference on Machine Learning (ICML) (2013) .
D. Clevert, T. Unterthiner, S. Hochreiter. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). ICLR 2015.
S. Ioffe, C. Szegedy. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. ICML 2015.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A. Rabinovich. Going Deeper with Convolutions. CVPR 2015.
K. He, X. Zhang, S. Ren, J. Sun. Deep Residual Learning for Image Recognition. CVPR 2016.
K. He, X. Zhang, S. Ren, J. Sun. Identity Mappings in Deep Residual Networks. ECCV 2016.

2章

C. Ma, C. Y. Yang, X. Yang, M. H. Yang. Learning a no-reference quality metric for single-image super-resolution. Computer Vision and Image Understanding (2017). Volume 158, page 1-16.
W. Shi, J. Caballero, F. Huszár, J. Totz, A. P. Aitken, R. Bishop, D. Rueckert, Z. Wang. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. CVPR 2016.

3章

S. Kaji, S. Kida. Overview of image-to-image translation by use of deep neural networks: denoising, super-resolution, modality conversion, and reconstruction in medical imaging. Radiological physics and technology (2019) Volume 12. pages 235-248.
O. Ronneberger, P. Fischer, T. Brox. U-Net: Convolutional Networks for Biomedical Image Segmentation. MICCAI 2015.
S. Iizuka, E. S.-Serra, H. Ishikawa. Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification. SIGGRAPH 2016.
L. Zhang, Y. Ji, X. Lin. Style Transfer for Anime Sketches with Enhanced Residual U-net and Auxiliary Classifier GAN. ACPR 2017.
J. Hu, L. Shen, S. Albanie, G. Sun, E. Wu. Squeeze-and-Excitation Networks. CVPR 2018.
D. Hendrycks, K. Zhao, S. Basart, J. Steinhardt, D. Song. Natural Adversarial Examples. arXiv:1907.07174.
Y. Zhang, K. Li, K. Li, L. Wang, B. Zhong, Y. Fu. Image Super-Resolution Using Very Deep Residual Channel Attention Networks. ECCV 2018.
T. Dai, J. Cai, Y. Zhang, S.-T. Xia, L. Zhang. Second-Order Attention Network for Single Image Super-Resolution. CVPR 2019.
A. Guha Roy, N. Navab, C. Wachinger. Concurrent Spatial and Channel Squeeze & Excitation in Fully Convolutional Networks. MICCAI 2018.

4章

T. Karras, T. Aila, S. Laine, J. Lehtinen. Progressive Growing of GANs for Improved Quality, Stability, and Variation. ICLR 2018.
G. Ghiasi, C. C. Fowlkes. Laplacian Pyramid Reconstruction and Refinement for Semantic Segmentation. ECCV 2016.
T. Zhao, Z. Yin. Pyramid-Based Fully ConvolutionalNetworks for Cell Segmentation. MICCAI 2018.
Y. Chen, H. Fan, B. Xu, Z. Yan, Y. Kalantidis, M. Rohrbach, S. Yan, J. Feng. Drop an Octave: Reducing Spatial Redundancy in Convolutional Neural Networks with Octave Convolution. ICCV 2019.
R. Durall, F.-J. Pfreundt, J. Keuper. Stabilizing GANs with Octave Convolutions. arXiv:1905.12534.
Z. Fan, J. Mo, B. Qiu, W. Li, G. Zhu, C. Li, J. Hu, Y. Rong, X. Chen. Accurate Retinal Vessel Segmentation via Octave Convolution Neural Network. arXiv:1906.12193.
J. Rownicka, P. Bell, S. Renals. Multi-scale Octave Convolutions for Robust Speech Recognition. arXiv:1910.14443.
C. Ledig, L. Theis, F. Huszar, J. Caballero, A. Cunningham, A. Acosta, A. Aitken, A. Tejani, J. Totz, Z. Wang, W. Shi. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. CVPR 2017.
M. S. Sarfraz, C. Seibold, H. Khalid, R. Stiefelhagen. Content and Colour Distillation for Learning Image Translations with the Spatial Profile Loss. BMVC 2019.
L. Gatys, A. Ecker, M. Bethge. A Neural Algorithm of Artistic Style. Journal of Vision 2016. Vol 16, No 12.

6章

G. Liu, F. A. Reda, K. J. Shih, T.-C. Wang, A. Tao, B. Catanzaro. Image Inpainting for Irregular Holes Using Partial Convolutions. ECCV 2018.
K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv:1409.1556.
L. A. Gatys, A. S. Ecker, M. Bethge. A Neural Algorithm of Artistic Style. arXiv:1508.06576.
Y. Ma, X. Liu, S. Bai, L. Wang, A. Liu, D. Tao, E. Hancock. Region-wise Generative Adversarial ImageInpainting for Large Missing Areas. arXiv:1909.12507.
T. Yu, Z. Guo, X. Jin, S. Wu, Z. Chen, W. Li, Z. Zhang, S. Liu. Region Normalization for Image Inpainting. AAAI 2020.

7章

I. J. Goodfellow, J. P.-Abadie, M. Mirza, B. Xu, D. W.-Farley, S. Ozair, A. Courville, Y. Bengio. Generative Adversarial Networks. NIPS 2014.
T. Karras, S. Laine, M. Aittala, J. Hellsten, J. Lehtinen, T. Aila. Analyzing and Improving the Image Quality of StyleGAN. arXiv:1912.04958.
A. Radford, L. Metz, S. Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. ICLR 2015.
J. H. Lim, J. C. Ye. Geometric GAN. arXiv:1705.02894.
D. Tran, R. Ranganath, D. M. Blei. Hierarchical Implicit Models and Likelihood-Free Variational Inference. NIPS 2017.
A. Brock, J. Donahue, K. Simonyan. Large Scale GAN Training for High Fidelity Natural Image Synthesis. ICLR 2019.
M. Arjovsky, S. Chintala, L. Bottou. Wasserstein GAN. arXiv:1701.07875.
T. Miyato, T. Kataoka, M. Koyama, Y. Yoshida. Spectral Normalization for Generative Adversarial Networks. ICLR 2018.
I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, A. Courville. Improved Training of Wasserstein GANs. NIPS 2017.
C. Chu, K. Minami, K. Fukumizu. Smoothness and Stability in GANs. ICLR 2020.
H. Zhang, I. Goodfellow, D. Metaxas, A. Odena. Self-Attention Generative Adversarial Networks. arXiv:1805.08318.
P. Isola, J.-Y. Zhu, T. Zhou, A. A. Efros. Image-to-Image Translation with Conditional Adversarial Networks. CVPR 2017.
J.-Y. Zhu, T. Park, P. Isola, A. A. Efros. Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks. ICCV 2017.

8章

C. Barnes, E. Shechtman, A. Finkelstein, D. B. Goldman. PatchMatch: A Randomized Correspondence Algorithm for Structural Image Editing. ACM Transactions on Graphics (Proc. SIGGRAPH) 2009.
J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, T. S. Huang. Generative Image Inpainting with Contextual Attention. CVPR 2018.
J. Yu, Z. Lin, J. Yang, X. Shen, X. Lu, T. Huang. Free-Form Image Inpainting with Gated Convolution. ICCV 2019.

9章

M. Sagong, Y. Shin, S. Kim, S. Park, S. Ko. PEPSI : Fast Image Inpainting With Parallel Decoding Network. CVPR 2019.
M. Sagong, Y. Shin, S. Kim, S. Park, S. Ko. PEPSI++: Fast and Lightweight Network for Image Inpainting. arXiv:1905.09010.

10章

K. Nazeri, E. Ng, T. Joseph, F. Z. Qureshi, M. Ebrahimi. EdgeConnect: Generative Image Inpainting with Adversarial Edge Learning. ICCV 2019.
T. Zhang, H. Fu, Y. Zhao, J. Cheng, M. Guo, Z. Gu, B. Yang, Y. Xiao, S. Gao, J. Liu. SkrGAN: Sketching-rendering Unconditional Generative Adversarial Networks for Medical Image Synthesis. MICCAI 2019.
J. Ostrofsky, A. Kozbelt, A. Seidel. Perceptual Constancies and Visual Selection as Predictors of Realistic Drawing Skill. Psychology of Aesthetics, Creativity, and the Arts 6(2), 124–136 (2012).
H. Zhang, I. Goodfellow, D. Metaxas, A. Odena. Self-Attention Generative Adversarial Networks. arXiv:1805.08318.
X. Wang, R. Girshick, A. Gupta, K. He. Non-local Neural Networks. CVPR 2018.
S. Xie, Z. Tu. Holistically-Nested Edge Detection. ICCV 2015.

11章

S. Iizuka, E. S-Serra. H. Ishikawa. Globally and Locally Consistent Image Completion. SIGGRAPH 2017.
C. Zheng, T.-J. Cham, J. Cai. Pluralistic Image Completion. CVPR 2019.
W. Cai, Z. Wei. Diversity-Generated Image Inpainting with Style Extraction. arXiv:1912.01834.
W. Xiong, J. Yu, Z. Lin, J. Yang, X. Lu, C. Barnes, J. Luo. Foreground-aware Image Inpainting. CVPR 2019.
Y. Ren, X. Yu, R. Zhang, T. H. Li, S. Liu, G. Li. StructureFlow: Image Inpainting via Structure-aware Appearance Flow. ICCV 2019.
T. R. Shaham, T. Dekel, T. Michaeli. SinGAN: Learning a Generative Model from a Single Natural Image. ICCV 2019.
B. V. Hoorick. Image Outpainting and Harmonization using Generative Adversarial Networks. arXiv:1912.10960.
Z. Yang, J. Dong, P. Liu, Y. Yang, S. Yan. Very Long Natural Scenery Image Prediction by Outpainting. ICCV 2019.
Y. Jo, J. Park. SC-FEGAN: Face Editing Generative Adversarial Network with User’s Sketch and Color. ICCV 2019.
X. Hong, P. Xiong, R. Ji, H. Fan. Deep Fusion Network for Image Completion. ACM-MM 2019.
H. Liu, B. Jiang, W. Huang, C. Yang. One-Stage Inpainting with Bilateral Attention and Pyramid Filling Block. arXiv:1912.08642.
Z. Guo, Z. Chen, T. Yu, J. Chen, S. Liu. Progressive Image Inpainting with Full-Resolution Residual Network. ACM-MM 2019.

試し読みできるよ

とらとBoothにおいてあります。Boothでは序章～1章を試し読みできます。3～4ヶ月かけて本気出して書いたので、ぜひ試し読みだけでもしてみてください。

技術書典8・新刊『モザイク除去から学ぶ　最先端のディープラーニング』

Booth（物理、電子版）：https://koshian2.booth.pm/items/1835219
とらのあな（物理版：～3/31最大19%還元）：https://ec.toranoana.shop/tora/ec/item/040030818462/
とらのあな電子版プレゼント：https://note.com/koshian2/n/nfe3961acb389
情報まとめ・質問用：https://github.com/koshian2/MosaicDeeplearningBook

187

172

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up