More than 1 year has passed since last update.

AUTOSAR CountdownAdvent Calendar 2022

@kaizen_nagoya(Dr. Kiyoshi Ogawa)

「ゼロから作るDeepLearning2 自然言語処理編」参考文献の参考文献(作業中)

Last updated at 2024-07-03Posted at 2018-08-04

学術書などを読むときには、参考文献をなるべく読むようにしている。また、自分の専門分野では、参考文献に掲載のある参考文献と、その参考文献の参考文献の参考文献まで辿り、どの文献が重複して参照されているかなどをまとめるように努力。

ここでは、「ゼロから作るDeepLearning2 自然言語処理編」参考文献の参考文献を作成する。
参考文献の参考文献の参考文献は、自然言語処理の対象として、読書会の最後までには作成予定。

<この項は書きかけです。順次追記します。>
This article is not completed. I will add some words in order.

参考文献

なお、下記は最新のURLなど、原著とはすでに違う状態になっています。原著の情報は
「ゼロから作るDeepLearning2 自然言語処理編」読書会用資料を　ゼロから作る。現在参考文献確認中。
https://qiita.com/kaizen_nagoya/items/33fb2c66175a25e39559
をご覧ください。このQiita資料も誤植を含んでいるかもしれません。コメント等でご指摘くださると幸いです。

python関連

1 Broadcasting
https://docs.scipy.org/doc/numpy-1.15.0/reference/ufuncs.html#broadcasting
参考文献見当たらず。

2.100 numpy exercises
https://github.com/rougier/numpy-100
参考文献見当たらず

Cupy web page
https://cupy.chainer.org/
参考文献見当たらず。「文献４は３の中」

4.Cupy install page
http://docs-cupy.chainer.org/en/stable/install.html
参考文献見当たらず

##ディプラーニングの基本事項
5.斎藤康毅, ゼロから作るDeep Learning ― Pythonで学ぶディープラーニングの理論と実装, オライリー, 2016, ISBN978-4-87311-758-4
https://www.oreilly.co.jp/books/9784873117584/

「ゼロから作るDeep Learning」参考文献一覧
https://qiita.com/kaizen_nagoya/items/82975f7b63b6ea2f33ff
https://researchmap.jp/joxn1ul6v-2078500/#_2078500

6.Gupta. Suyog. et al: "Deep learning with limited numerical precision.", Proceedings of the 32nd International Conference on Machine Learning (ICML-15) 2015
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1001.5463&rep=rep1&type=pdf

References
6.1.An, G. The effects of adding noise during backpropagation training on a generalization performance. Neural Com- putation, 8(3):643–674, 1996.
6.2.Audhkhasi, K., Osoba, O., and Kosko, B. Noise benefits in backpropagation and deep bidirectional pre-training. In Neural Networks (IJCNN), The 2013 International Joint Conference on, pp. 1–8. IEEE, 2013.
6.3.Baboulin, M., Buttari, A., Dongarra, J., Kurzak, J., Lan- gou, J., Langou, J., Luszczek, P., and Tomov, S. Accel- erating scientific computations with mixed precision al- gorithms. Computer Physics Communications, 180(12): 2526–2533, 2009.
6.4.Bishop, C. M. Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1):108– 116, 1995.
6.5.Bottou, L. and Bousquet, O. The tradeoffs of large scale learning. In NIPS, volume 4, pp. 2, 2007.
6.6.Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N., et al. Dadiannao: A machine-learning supercomputer. In Microarchitecture (MICRO), 2014 47th Annual IEEE/ACM International Symposium on, pp. 609–622. IEEE, 2014.
6.7.Chilimbi, T., Suzue, Y., Apacible, J., and Kalyanaraman, K. Project Adam: Building an efficient and scalable deep learning training system. In 11th USENIX Sympo- sium on Operating Systems Design and Implementation (OSDI 14), pp. 571–582, Broomfield, CO, October 2014.
6.8.Ciresan, D. C., Meier, U., Gambardella, L. M., and Schmidhuber, J. Deep, big, simple neural nets for hand- written digit recognition. Neural computation, 22(12): 3207–3220, 2010.
6.9.Coates, A., Huval, B., Wang, T., Wu, D., Catanzaro, B., and Andrew, N. Deep learning with COTS HPC systems. In Proceedings of The 30th International Conference on Machine Learning, pp. 1337–1345, 2013.
6.10.Courbariaux, M., Bengio, Y., and David, J.-P. Low pre- cision arithmetic for deep learning. arXiv preprint arXiv:1412.7024, 2014.
6.11.Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Senior, A., Tucker, P., Yang, K., Le, Q. V., et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, pp. 1223–1231, 2012.
6.12.Farabet, C., Martini, B., Corda, B., Akselrod, P., Culur- ciello, E., and LeCun, Y. Neuflow: A runtime recon- figurable dataflow processor for vision. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2011 IEEE Computer Society Conference on, pp. 109– 116. IEEE, 2011.
6.13.Gokhale, V., Jin, J., Dundar, A., Martini, B., and Culur- ciello, E. A 240 G-ops/s mobile coprocessor for deep neural networks. In Computer Vision and Pattern Recog- nition Workshops (CVPRW), 2014 IEEE Conference on, pp. 696–701. IEEE, 2014.
6.14.Hammerstrom, D. A VLSI architecture for high- performance, low-cost, on-chip learning. In Neural Net- works, 1990., 1990 IJCNN International Joint Confer- ence on, pp. 537–544. IEEE, 1990.
6.15.Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., and Salakhutdinov, R. R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580, 2012.
6.16.Ho ̈hfeld, M. and Fahlman, S. E. Probabilistic rounding in neural network learning with limited precision. Neuro- computing, 4(6):291–299, 1992.
6.17.Holt, J. and Hwang, J.-N. Finite precision error analysis of neural network hardware implementations. Computers, IEEE Transactions on, 42(3):281–290, 1993.
6.18.Iwata, A., Yoshida, Y., Matsuda, S., Sato, Y., and Suzu- mura, N. An artificial neural network accelerator us- ing general purpose 24 bit floating point digital signal processors. In Neural Networks, 1989. IJCNN., Interna- tional Joint Conference on, pp. 171–175. IEEE, 1989.
6.18.Kim, J., Hwang, K., and Sung, W. X1000 real-time phoneme recognition VLSI using feed-forward deep neural networks. In Acoustics, Speech and Signal Pro- cessing (ICASSP), 2014 IEEE International Conference on, pp. 7510–7514. IEEE, 2014.
6.20.Krizhevsky, A., Sutskever, I., and Hinton, G. E. ImageNet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp. 1097–1105, 2012.
6.21.Kung, H. Why systolic architectures? Computer, 15(1): 37–46, Jan 1982. doi: 10.1109/MC.1982.1653825.
Lecun, Y. and Cortes, C. The MNIST database of hand- written digits. URL http://yann.lecun.com/exdb/mnist/.
6.22.LeCun, Y., Bottou, L., Bengio, Y., and Haffner, P. Gradient- based learning applied to document recognition. Pro- ceedings of the IEEE, 86(11):2278–2324, 1998.
6.23.Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F., Jackson, B. L., Imam, N.,
Deep Learning with Limited Numerical Precision
6.24. Guo, C., Nakamura, Y., et al. A million spiking-neuron integrated circuit with a scalable communication net- work and interface. Science, 345(6197):668–673, 2014.
6.25.Murray, A. F. and Edwards, P. J. Enhanced MLP perfor- mance and fault tolerance resulting from synaptic weight noise during training. Neural Networks, IEEE Transac- tions on, 5(5):792–802, 1994.
6.26.Recht, B., Re, C., Wright, S., and Niu, F. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pp. 693–701, 2011.
6.27.Vanhoucke, V., Senior, A., and Mao, M. Z. Improving the speed of neural networks on CPUs. In Proc. Deep Learn- ing and Unsupervised Feature Learning NIPS Workshop, 2011.
6.28.Wu, R., Yan, S., Shan, Y., Dang, Q., and Sun, G. Deep image: Scaling up image recognition. arXiv preprint arXiv:1501.02876, 2015.

7.Jouppi. Norman P.. et al: "In-datacenter performance analysis of a tensor processing unit."Proceedings of the 44th Annual International Symposium on Computer Architecture, ACM. 2017
https://arxiv.org/abs/1704.04760

References
Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M. and Ghemawat, S., 2016. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv preprint arXiv:1603.04467.
Albericio, J., Judd, P., Hetherington, T., Aamodt, T., Jerger, N.E. and Moshovos, A., 2016 Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing. Proceedings of the 43rd International Symposium on Computer Architecture.
Adolf, R., Rama, S., Reagen, B., Wei, G.Y. and Brooks, D., 2016, September. Fathom: reference workloads for modern deep learning methods. IEEE International Symposium on Workload Characterization (IISWC).
Asanović, K. 2002. Programmable Neurocomputing, in The Handbook of Brain Theory and Neural Networks: Second Edition, M. A. Arbib (Ed.), MIT Press, ISBN 0-262-01197-2, November 2002. https://people.eecs.berkeley.edu/~krste/papers/neurocomputing.pdf
Asanović, K. 1998. Asanović, K., Beck, Johnson, J., Wawrzynek, J., Kingsbury, B. and Morgan, N., November 1998. Training Neural Networks with Spert-II. Chapter 11 in Parallel Architectures for Artificial Networks: Paradigms and Implementations, N. Sundararajan and P. Saratchandran (Eds.), IEEE Computer Society Press, ISBN 0-8186-8399-6. https://people.eecs.berkeley.edu/~krste/papers/annbook.pdf
Barroso, L.A. and Hölzle, U., 2007. The case for energy-proportional computing. IEEE Computer, vol. 40.
Barr, J. September 29, 2016, New P2 Instance Type for Amazon EC2 – Up to 16 GPUs. https://aws.amazon.com/blogs/aws/new-p2-instance-type-for-amazon-ec2-up-to-16-gpus/
Brooks, D. November 4, 2016. Private communication.
Caulfield, A.M., Chung, E.S., Putnam, A., Haselman, H.A.J.F.M., Humphrey, S.H.M., Daniel, P.K.J.Y.K., Ovtcharov, L.T.M.K., Lanka, M.P.L.W.S. and Burger, D.C.D., 2016. A Cloud-Scale Acceleration Architecture. MICRO conference.
Cavigelli, L., Gschwend, D., Mayer, C., Willi, S., Muheim, B. and Benini, L., 2015, May. Origami: A convolutional network accelerator. Proceedings of the 25th edition on Great Lakes Symposium on VLSI.
Chakradhar, S., Sankaradas, M., Jakkula, V. and Cadambi, S., 2010, June. A dynamically configurable coprocessor for convolutional neural networks. Proceedings of the 37th International Symposium on Computer Architecture.
Chen, T., Du, Z., Sun, N., Wang, J., Wu, C., Chen, Y. and Temam, O., 2014. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. Proceedings of ASPLOS.
Chen, Y., Luo, T., Liu, S., Zhang, S., He, L., Wang, J., Li, L., Chen, T., Xu, Z., Sun, N. and Temam, O., 2014, December. Dadiannao: A machine-learning supercomputer. Proceedings of the 47th Annual International Symposium on Microarchitecture.
Chen, Y.H., Emer, J. and Sze, V., 2016. Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Networks. Proceedings of the 43rd International Symposium on Computer Architecture.
Chen, Y., Chen, T.,Xu, Z., Sun, N., and Teman, O., 2016. DianNao Family: Energy-Efficient Hardware Accelerators for Machine Learning, Research Highlight, Communications of the ACM, 59(11).
Chi, P., Li, S., Qi, Z., Gu, P., Xu, C., Zhang, T., Zhao, J., Liu, Y., Wang, Y. and Xie, Y., 2016. PRIME: A Novel Processing-In-Memory Architecture for Neural Network Computation in ReRAM-based Main Memory. Proceedings of the 43rd International Symposium on Computer Architecture.
Clark, J. October 26, 2015, Google Turning Its Lucrative Web Search Over to AI Machines. Bloomberg Technology, www.bloomberg.com.
Dally, W. February 9, 2016. High Performance Hardware for Machine Learning, Cadence ENN Summit.
Dean, J. and Barroso, L.A., 2013. The tail at scale. Communications of the ACM, 56(2).
Dean, J. July 7, 2016 Large-Scale Deep Learning with TensorFlow for Building Intelligent Systems, ACM Webinar.
Du, Z., Fasthuber, R., Chen, T., Ienne, P., Li, L., Luo, T., Feng, X., Chen, Y. and Temam, O., 2015, June. ShiDianNao: shifting vision processing closer to the sensor. Proceedings of the 42nd International Symposium on Computer Architecture.
Farabet, C., Poulet, C., Han, J.Y. and LeCun, Y., 2009, August. Cnp: An FPGA-based processor for convolutional networks. 2009 International Conference on Field Programmable Logic and Applications.
Farabet, C., Martini, B., Corda, B., Akselrod, P., Culurciello, E. and LeCun, Y., 2011, June. Neuflow: A runtime reconfigurable dataflow processor for vision. In CVRP 2011 Workshops.
Gupta, S., Agrawal, A., Gopalakrishnan, K. and Narayanan, P., 2015, July. Deep Learning with Limited Numerical Precision. ICML. Hammerstrom, D., 1990, June. A VLSI architecture for high-performance, low-cost, on-chip learning. 1990 IJCNN International Joint Conference on Neural Networks.
Han, S.; Pool, J.; Tran, J.; and Dally, W., 2015. Learning both weights and connections for efficient neural networks. In Advances in Neural Information Processing Systems.
Han, S., Liu, X., Mao, H., Pu, J., Pedram, A., Horowitz, M.A. and Dally, W.J., 2016. EIE: efficient inference engine on compressed deep neural network. Proceedings of the 43rd International Symposium on Computer Architecture.
He, K., Zhang, X., Ren, S. and Sun, J., 2016. Identity mappings in deep residual networks. Also in arXiv preprint arXiv:1603.05027.
Hennessy, J.L. and Patterson, D.A., 2018. Computer architecture: a quantitative approach, 6th edition, Elsevier.
Hölzle, U. and Barroso, L., 2009. The datacenter as a computer. Morgan and Claypool.
Ienne, P., Cornu, T. and Kuhn, G., 1996. Special-purpose digital hardware for neural networks: An architectural survey. Journal of VLSI signal processing systems for signal, image and video technology, 13(1).
Intel, 2016, Intel® Xeon® Processor E5-4669 v3, http://ark.intel.com/products/85766/Intel-Xeon-Processor-E5-4669-v3-45M-Cache-2_10-GHz Jouppi, N. May 18, 2016. Google supercharges machine learning tasks with TPU custom chip. https://cloudplatform.googleblog.com
Keutzer, K., 2016. If I could only design one circuit...: technical perspective. Communications of the ACM, 59(11),.
Kim, D., Kung, J.H., Chai, S., Yalamanchili, S. and Mukhopadhyay, S., 2016. Neurocube: A Programmable Digital Neuromorphic Architecture with High-Density 3D Memory. Proceedings of the 43rd International Symposium on Computer Architecture.
Krizhevsky, A., Sutskever, I. and Hinton, G., 2012. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems.
Kung, H.T. and Leiserson, C.E., 1980. Algorithms for VLSI processor arrays. Introduction to VLSI systems.
Lange, K.D., 2009. Identifying shades of green: The SPECpower benchmarks. IEEE Computer, 42(3).
Larabel, M. March 10, 2016, Google Looks To Open Up StreamExecutor To Make GPGPU Programming Easier, Phoronix, https://www.phoronix.com/scan.php?page=news_item&px=Google-StreamExec-Parallel.
LiKamWa, R., Hou, Y., Gao, J., Polansky, M. and Zhong, L., 2016. RedEye: Analog ConvNet Image Sensor Architecture for Continuous Mobile Vision. Proceedings of the 43rd International Symposium on Computer Architecture.
Liu, D., Chen, T., Liu, S., Zhou, J., Zhou, S., Teman, O., Feng, X., Zhou, X. and Chen, Y., 2015, March. Pudiannao: A polyvalent machine learning accelerator. Proceedings of the 42nd International Symposium on Computer Architecture.
Liu, S., Du, Z.D., Tao, J.H., Han, D., Luo, T., Xie, Y., Chen, Y. and Chen, T., 2016. Cambricon: An instruction set architecture for neural networks. Proceedings of the 43rd International Symposium on Computer Architecture.
Metz, C. September 26, 2016, Microsoft Bets Its Future On A Reprogrammable Computer Chip, Wired Magazine, https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/
Nvidia, January 2015. Tesla K80 GPU Accelerator. Board Specification https://images.nvidia.com/content/pdf/kepler/Tesla-K80-BoardSpec-07317-001-v05.pdf.
Nvidia, 2016. Tesla GPU Accelerators For Servers. http://www.nvidia.com/object/tesla-servers.html.
Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K. and Chung, E.S., February 2, 2015. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper. https://www.microsoft.com/en-us/research/publication/accelerating-deep-convolutional-neural-networks-using-specialized-hardware/
Ovtcharov, K., Ruwase, O., Kim, J.Y., Fowers, J., Strauss, K. and Chung, E.S., 2015, August. Toward accelerating deep learning at scale using specialized hardware in the datacenter. 2015 IEEE Hot Chips 27 Symposium.
Patterson, D.A., 2004. Latency lags bandwith. Communications of the ACM, 47(10).
Peemen, M., Setio, A.A., Mesman, B. and Corporaal, H., 2013, October. Memory-centric accelerator design for convolutional neural networks. In 2013 IEEE 31st International Conference on Computer Design (ICCD).
Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J-Y., Lanka, S., Larus, J., Peterson, E., Pope, S ., Smith, A., Thong, J., Xiao, P.Y., Burger, D. 2014, June. A reconfigurable fabric for accelerating large-scale datacenter services. 41st International Symposium on Computer Architecture. Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J-Y., Lanka, S., Larus, J., Peterson, E., Pope, S ., Smith, A., Thong, J., Xiao, P.Y., Burger, D. 2015. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. IEEE Micro, 35(3).
Putnam, A., Caulfield, A.M., Chung, E.S., Chiou, D., Constantinides, K., Demme, J., Esmaeilzadeh, H., Fowers, J., Gopal, G.P., Gray, J., Haselman, M., Hauck, S., Heil, S., Hormati, A., Kim, J-Y., Lanka, S., Larus, J., Peterson, E., Pope, S ., Smith, A., Thong, J., Xiao, P.Y., Burger, D. 2016. A Reconfigurable Fabric for Accelerating Large-Scale Datacenter Services. Communications of the ACM.
Qadeer, W., Hameed, R., Shacham, O., Venkatesan, P., Kozyrakis, C. and Horowitz, M.A., 2013, June. Convolution engine: balancing efficiency & flexibility in specialized computing. Proceedings of the 40th International Symposium on Computer Architecture.
Ramacher, U., Beichter, J., Raab, W., Anlauf, J., Bruels, N., Hachmann, U. and Wesseling, M., 1991. Design of a 1st Generation Neurocomputer. In VLSI design of Neural Networks. Springer US.
Reagen, B., Whatmough, P., Adolf, R., Rama, S., Lee, H., Lee, S.K., Hernández-Lobato, J.M., Wei, G.Y. and Brooks, D., 2016. Minerva: Enabling low-power, highly-accurate deep neural network accelerators. Proceedings of the 43rd International Symposium on Computer Architecture.
Ross, J., Jouppi, N., Phelps, A., Young, C., Norrie, T., Thorson, G., Luu, D., 2015. Neural Network Processor, Patent Application No. 62/164,931. Ross, J., Phelps, A., 2015. Computing Convolutions Using a Neural Network Processor, , Patent Application No. 62/164,902.
Ross, J., 2015. Prefetching Weights for a Neural Network Processor, Patent Application No. 62/164,981.
Ross, J., Thorson, G., 2015. Rotating Data for Neural Network Computations,Patent Application No. 62/164,908.
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M. and Berg, A.C., 2015. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3).
Schurman, E. and Brutlag, J., 2009, June. The user and business impact of server delays, additional bytes, and HTTP chunking in web search. In Velocity Web Performance and Operations Conference.
Shafiee, A., Nag, A., Muralimanohar, N., Balasubramonian, R., Strachan, J.P., Hu, M., Williams, R.S. and Srikumar, V., 2016. ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars. Proceedings of the 43rd International Symposium on Computer Architecture.
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M. and Dieleman, S., 2016. Mastering the game of Go with deep neural networks and tree search. Nature, 529(7587).
Smith, J.E., 1982, April. Decoupled access/execute computer architectures. Proceedings of the 11th International Symposium on Computer Architecture.
Steinberg, D., 2015. Full-Chip Simulations, Keys to Success. Proceedings of the Synopsys Users Group (SNUG) Silicon Valley 2015.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V. and Rabinovich, A., 2015. Going deeper with convolutions. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
Thorson, G., Clark, C., Luu, D., 2015. Vector Computation Unit in a Neural Network Processor, Patent Application No. 62/165,022.
Williams, S., Waterman, A. and Patterson, D., 2009. Roofline: an insightful visual performance model for multicore architectures. Communications of the ACM.
Wu, Y., Schuster, M., Chen, Z., Le, Q., Norouzi, M., Macherey, W., Krikun, M., Cao, Y., Gao, Q., Macherey, K., Klingner, J., Shah, A., Johnson, M., Liu, X., Kaiser, Ł., Gouws, S., Kato, Y., Kudo, T., Kazawa, H., Stevens, K., Kurian, G., Patil, N., Wang, W., Young, C., Smith, J., Riesa, J., Rudnick, A., Vinyals, O., Corrado, G., Hughes, M., and Dean, J. September 26, 2016, Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation, http://arxiv.org/abs/1609.08144.
Young, C., 2015. Batch Processing in a Neural Network Processor, Patent Application No. 62/165,020.
Zhang, C., Li, P., Sun, G., Guan, Y., Xiao, B. and Cong, J., 2015, February. Optimizing FPGA-based accelerator design for deep convolutional neural networks. Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays.

8.Ba. Jimmy Lei. Jamie Ryan Kiros, and Geoffrey E. Hinton: "Layer normalization." arXiv preprint arXiv:1607.06450, 2016
https://arxiv.org/abs/1607.06450

References
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In NIPS, 2012.
Geoffrey Hinton, Li Deng, Dong Yu, George E Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Tara N Sainath, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups. IEEE, 2012.
Jeffrey Dean, Greg Corrado, Rajat Monga, Kai Chen, Matthieu Devin, Mark Mao, Andrew Senior, Paul Tucker, Ke Yang, Quoc V Le, et al. Large scale distributed deep networks. In NIPS, 2012.
Sergey Ioffe and Christian Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. ICML, 2015.
Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Advances in neural information processing systems, pages 3104–3112, 2014.
Ce ́sar Laurent, Gabriel Pereyra, Phile ́mon Brakel, Ying Zhang, and Yoshua Bengio. Batch normalized recurrent neural networks. arXiv preprint arXiv:1510.01378, 2015.
Dario Amodei, Rishita Anubhai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, Jingdong Chen, Mike Chrzanowski, Adam Coates, Greg Diamos, et al. Deep speech 2: End-to-end speech recognition in english and mandarin. arXiv preprint arXiv:1512.02595, 2015.
Tim Cooijmans, Nicolas Ballas, Ce ́sar Laurent, and Aaron Courville. Recurrent batch normalization. arXiv preprint arXiv:1603.09025, 2016.
Tim Salimans and Diederik P Kingma. Weight normalization: A simple reparameterization to accelerate train- ing of deep neural networks. arXiv preprint arXiv:1602.07868, 2016.
Behnam Neyshabur, Ruslan R Salakhutdinov, and Nati Srebro. Path-sgd: Path-normalized optimization in deep neural networks. In Advances in Neural Information Processing Systems, pages 2413–2421, 2015.
Shun-Ichi Amari. Natural gradient works efficiently in learning. Neural computation, 1998.
Ivan Vendrov, Ryan Kiros, Sanja Fidler, and Raquel Urtasun. Order-embeddings of images and language.
ICLR, 2016.
The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Fre ́de ́ric Bastien, Justin Bayer, Anatoly Belikov, et al. Theano: A python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688, 2016.
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dolla ́r, and C Lawrence Zitnick. Microsoft coco: Common objects in context. ECCV, 2014.
Kyunghyun Cho, Bart Van Merrie ̈nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. Learning phrase representations using rnn encoder-decoder for statistical machine translation. EMNLP, 2014.
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. ICLR, 2015.
Ryan Kiros, Ruslan Salakhutdinov, and Richard S Zemel. Unifying visual-semantic embeddings with multi- modal neural language models. arXiv preprint arXiv:1411.2539, 2014.
D. Kingma and J. L. Ba. Adam: a method for stochastic optimization. ICLR, 2014. arXiv:1412.6980.
Liwei Wang, Yin Li, and Svetlana Lazebnik. Learning deep structure-preserving image-text embeddings.
CVPR, 2016.
Karl Moritz Hermann, Tomas Kocisky, Edward Grefenstette, Lasse Espeholt, Will Kay, Mustafa Suleyman,
and Phil Blunsom. Teaching machines to read and comprehend. In NIPS, 2015.
Ryan Kiros, Yukun Zhu, Ruslan R Salakhutdinov, Richard Zemel, Raquel Urtasun, Antonio Torralba, and Sanja
Fidler. Skip-thought vectors. In NIPS, 2015.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in
vector space. arXiv preprint arXiv:1301.3781, 2013.
Yukun Zhu, Ryan Kiros, Rich Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In ICCV, 2015.
Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and Roberto Zamparelli. Semeval-2014 task 1: Evaluation of compositional distributional semantic models on full sentences through semantic relatedness and textual entailment. SemEval-2014, 2014.
Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In ACL, pages 115–124, 2005.
Minqing Hu and Bing Liu. Mining and summarizing customer reviews. In Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, 2004.
Bo Pang and Lillian Lee. A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In ACL, 2004.
Janyce Wiebe, Theresa Wilson, and Claire Cardie. Annotating expressions of opinions and emotions in lan- guage. Language resources and evaluation, 2005.
K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. DRAW: a recurrent neural network for image generation. arXiv:1502.04623, 2015.
Hugo Larochelle and Iain Murray. The neural autoregressive distribution estimator. In AISTATS, volume 6, page 622, 2011.
Marcus Liwicki and Horst Bunke. Iam-ondb-an on-line english sentence database acquired from handwritten text on a whiteboard. In ICDAR, 2005.
Alex Graves. Generating sequences with recurrent neural networks. arXiv preprint arXiv:1308.0850, 2013.

9.Nitish Srivastava, et al: "Dropout: A Simple Way to Prevent Neural Networks from Overfitting."Journal of Machine Learning Research 15, 2014, p.1929-1958
http://www.cs.toronto.edu/%7Ersalakhu/papers/srivastava14a.pdf

References
M. Chen, Z. Xu, K. Weinberger, and F. Sha. Marginalized denoising autoencoders for domain adaptation. In Proceedings of the 29th International Conference on Machine Learning, pages 767–774. ACM, 2012.
G. E. Dahl, M. Ranzato, A. Mohamed, and G. E. Hinton. Phone recognition with the mean- covariance restricted Boltzmann machine. In Advances in Neural Information Processing Systems 23, pages 469–477, 2010.
O. Dekel, O. Shamir, and L. Xiao. Learning to classify with missing and corrupted features. Machine Learning, 81(2):149–178, 2010.
A. Globerson and S. Roweis. Nightmare at test time: robust learning by feature deletion. In Proceedings of the 23rd International Conference on Machine Learning, pages 353–360. ACM, 2006.
I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning, pages 1319– 1327. ACM, 2013.
G. Hinton and R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504 – 507, 2006.
G. E. Hinton, S. Osindero, and Y. Teh. A fast learning algorithm for deep belief nets. Neural Computation, 18:1527–1554, 2006.
K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? In Proceedings of the International Conference on Computer Vision (ICCV’09). IEEE, 2009.
A. Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolu- tional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106–1114, 2012.
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural Computa- tion, 1(4):541–551, 1989.
Y. Lin, F. Lv, S. Zhu, M. Yang, T. Cour, K. Yu, L. Cao, Z. Li, M.-H. Tsai, X. Zhou, T. Huang, and T. Zhang. Imagenet classification: fast descriptor coding and large-scale svm training. Large scale visual recognition challenge, 2010.
A. Livnat, C. Papadimitriou, N. Pippenger, and M. W. Feldman. Sex, mixability, and modularity. Proceedings of the National Academy of Sciences, 107(4):1452–1457, 2010.
V. Mnih. CUDAMat: a CUDA-based matrix class for Python. Technical Report UTML TR 2009-004, Department of Computer Science, University of Toronto, November 2009.
A. Mohamed, G. E. Dahl, and G. E. Hinton. Acoustic modeling using deep belief networks. IEEE Transactions on Audio, Speech, and Language Processing, 2010.
R. M. Neal. Bayesian Learning for Neural Networks. Springer-Verlag New York, Inc., 1996.
Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and A. Y. Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011, 2011.
S. J. Nowlan and G. E. Hinton. Simplifying neural networks by soft weight-sharing. Neural Computation, 4(4), 1992.
D. Povey, A. Ghoshal, G. Boulianne, L. Burget, O. Glembek, N. Goel, M. Hannemann, P. Motlicek, Y. Qian, P. Schwarz, J. Silovsky, G. Stemmer, and K. Vesely. The Kaldi Speech Recognition Toolkit. In IEEE 2011 Workshop on Automatic Speech Recognition and Understanding. IEEE Signal Processing Society, 2011.
R. Salakhutdinov and G. Hinton. Deep Boltzmann machines. In Proceedings of the Inter- national Conference on Artificial Intelligence and Statistics, volume 5, pages 448–455, 2009.
R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine Learning. ACM, 2008.
J. Sanchez and F. Perronnin. High-dimensional signature compression for large-scale image classification. In Proceedings of the 2011 IEEE Conference on Computer Vision and Pattern Recognition, pages 1665–1672, 2011.
P. Sermanet, S. Chintala, and Y. LeCun. Convolutional neural networks applied to house numbers digit classification. In International Conference on Pattern Recognition (ICPR 2012), 2012.
P. Simard, D. Steinkraus, and J. Platt. Best practices for convolutional neural networks ap- plied to visual document analysis. In Proceedings of the Seventh International Conference on Document Analysis and Recognition, volume 2, pages 958–962, 2003.
J. Snoek, H. Larochelle, and R. Adams. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems 25, pages 2960–2968, 2012.
N. Srebro and A. Shraibman. Rank, trace-norm and max-norm. In Proceedings of the 18th annual conference on Learning Theory, COLT’05, pages 545–560. Springer-Verlag, 2005.
N. Srivastava. Improving Neural Networks with Dropout. Master’s thesis, University of Toronto, January 2013.
R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B. Methodological, 58(1):267–288, 1996.
A. N. Tikhonov. On the stability of inverse problems. Doklady Akademii Nauk SSSR, 39(5): 195–198, 1943.
L. van der Maaten, M. Chen, S. Tyree, and K. Q. Weinberger. Learning with marginalized corrupted features. In Proceedings of the 30th International Conference on Machine Learning, pages 410–418. ACM, 2013.
P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol. Extracting and composing robust features with denoising autoencoders. In Proceedings of the 25th International Conference on Machine Learning, pages 1096–1103. ACM, 2008.
P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. In Proceedings of the 27th International Conference on Machine Learning, pages 3371–3408. ACM, 2010.
S. Wager, S. Wang, and P. Liang. Dropout training as adaptive regularization. In Advances in Neural Information Processing Systems 26, pages 351–359, 2013.
S. Wang and C. D. Manning. Fast dropout training. In Proceedings of the 30th International Conference on Machine Learning, pages 118–126. ACM, 2013.
H. Y. Xiong, Y. Barash, and B. J. Frey. Bayesian prediction of tissue-regulated splicing using RNA sequence and cellular context. Bioinformatics, 27(18):2554–2562, 2011.
M. D. Zeiler and R. Fergus. Stochastic pooling for regularization of deep convolutional neural networks. CoRR, abs/1301.3557, 2013.

##ディープラーニングによる自然言語処理
10.Stanford University CS224d: DeepLearning for Natural Language Processing
http://cs224d.stanford.edu

Intro to NLP and Deep Learning
Suggested Readings:
[Linear Algebra Review]
[Probability Review]
[Convex Optimization Review]
[More Optimization (SGD) Review]
[From Frequency to Meaning: Vector Space Models of Semantics]

Simple Word Vector representations: word2vec, GloVe
Suggested Readings:
[Distributed Representations of Words and Phrases and their Compositionality]
[Efficient Estimation of Word Representations in Vector Space]

Advanced word vector representations: language models, softmax, single layer networks
Suggested Readings:
[GloVe: Global Vectors for Word Representation]
[Improving Word Representations via Global Context and Multiple Word Prototypes]

（続く）

Oxford Deep NLP 2017 course
http://github.com/oxford-cs-deepnlp-2017/lectures
Lecture 2a- Word Level Semantics [Ed Grefenstette]

Words are the core meaning bearing units in language. Representing and learning the meanings of words is a fundamental task in NLP and in this lecture the concept of a word embedding is introduced as a practical and scalable solution.

[slides] [video]

Reading

Embeddings Basics

Firth, John R. "A synopsis of linguistic theory, 1930-1955." (1957): 1-32.
Curran, James Richard. "From distributional to semantic similarity." (2004).
Collobert, Ronan, et al. "Natural language processing (almost) from scratch." Journal of Machine Learning Research 12. Aug (2011): 2493-2537.
Mikolov, Tomas, et al. "Distributed representations of words and phrases and their compositionality." Advances in neural information processing systems. 2013.
Datasets and Visualisation

Finkelstein, Lev, et al. "Placing search in context: The concept revisited." Proceedings of the 10th international conference on World Wide Web. ACM, 2001.
Hill, Felix, Roi Reichart, and Anna Korhonen. "Simlex-999: Evaluating semantic models with (genuine) similarity estimation." Computational Linguistics (2016).
Maaten, Laurens van der, and Geoffrey Hinton. "Visualizing data using t-SNE." Journal of Machine Learning Research 9.Nov (2008): 2579-2605.
Blog posts

Deep Learning, NLP, and Representations, Christopher Olah.
Visualizing Top Tweeps with t-SNE, in Javascript, Andrej Karpathy.
Further Reading

Hermann, Karl Moritz, and Phil Blunsom. "Multilingual models for compositional distributed semantics." arXiv preprint arXiv:1404.4641 (2014).
Levy, Omer, and Yoav Goldberg. "Neural word embedding as implicit matrix factorization." Advances in neural information processing systems. 2014.
Levy, Omer, Yoav Goldberg, and Ido Dagan. "Improving distributional similarity with lessons learned from word embeddings." Transactions of the Association for Computational Linguistics 3 (2015): 211-225.
Ling, Wang, et al. "Two/Too Simple Adaptations of Word2Vec for Syntax Problems." HLT-NAACL. 2015.jkhんjkl；：」

Young D. Hazarika, S. Peoria. and E. Cambria: ""Recent trends in deep learning based natural language processing." in arXiv preprint arXiv:1708.02709, 2017
https://arxiv.org/abs/1708.02709
坪井裕太,海野裕也, 鈴木潤:「深層学習による自然言語処理(機械学習プロフェッショナルシリーズ)」講談社, 2017, ISBN 978-4-06-152924-3
https://www.kspub.co.jp/book/detail/1529243.html

ディープラーニング登場以前の自然言語処理

14.Steven Bird. Iwane Klein, Edward Loper:「入門　自然言語処理」, オライリージャパン, 2010,ISBN978-4-87311-470-5
https://www.oreilly.co.jp/books/9784873114705/
15.Jeffrey E.F. Friedl:「詳説　正規表現第3版」オライリージャパン, 2008, ISBN978-4-87311-359-3
https://www.oreilly.co.jp/books/9784873113593/
16.Christopher D. Manning, Hinrich Schutze:「統計的自然言語処理の基礎」共立出版, 2017, ISBN 978-4-320-12421-9
http://www.kyoritsu-pub.co.jp/bookdetail/9784320124219
17.Miller, George A;"Wordnet: a lexical database for English.", Communications of the ACM 38.11, 1995, p.39-41
http://l2r.cs.uiuc.edu/~danr/Teaching/CS598-05/Papers/miller95.pdf
18.WordNet Interface
http://www.nltk.org/howto/wordnet.html

カウントベース手法による単語ベクトル

19.Church, Kenneth Ward, and Patrick Hanks: "Word association norms, mutual information, and lexicography.", Communicational linguistics 16.1, 1990, p.22-29
http://www.aclweb.org/anthology/J90-1003
20. Deerwester, Scott, et al:"Indexing by latent semantic analysis.", Journal of the american society for information science 41.6, 1990, p.391-407
http://www.cs.bham.ac.uk/~pxt/IDA/lsa_ind.pdf
第一刷には終了ページの記載が抜けている。
21. TruncatedSVD
http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html

word2vec関連

22.Mikolov, Tomas, et al:"Efficient estimation of word representations in vector space." arXiv preprint arXiv:1301.3781, 2013
https://arxiv.org/abs/1301.3781
23.Mikolov, Tomas, et al:"Distributed representations of words and phrases and their compositionally.", Advances in neural information processing systems, 2013
https://arxiv.org/pdf/1310.4546.pdf
24.Baroni, Marco, Georgiana Dinu, and Germán Kruszewski:"Don't count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors.", ACL(1), 2014,
雑誌の記述が不明。ACLはAssociation for Computational Linguistics.の略。雑誌名はProceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, p.238–247
http://www.aclweb.org/anthology/P14-1023
25.Levy, Omer, Yoan Goldberg, and Ido Dagan: "Improving distributional similarity with lessons learned from word embeddings." Transactions of the Association for Computational Linguistics 3, 2015, p.211-225.
https://levyomer.files.wordpress.com/2015/03/improving-distributional-similarity-tacl-2015.pdf
26.Levy, Omer, Yoan Goldberg:"Neural word embedding as implicit matrix factorization." Advances in neural information processing systems, 2014
https://papers.nips.cc/paper/5477-neural-word-embedding-as-implicit-matrix-factorization.pdf
27. Pennington, Jeffrey, Richard Soccer, and Christopher D. Manning:"Glove: Global Vectors for Word Representation.", EMNLP. VOl14. 2014
https://nlp.stanford.edu/pubs/glove.pdf
表題"Glove:"ではなく、"GloVe:"
28.Bengio, Yoshua, et al."A neural probabilistic language model.", Journal of machine learning research 3. Feb, 2003, p.1137-1155.
http://www.iro.umontreal.ca/~vincentp/Publications/lm_jmlr.pdf

RNN関連

29.Talathi, Sachin S., and Aniket Vartan:"Improving performance of recurrent neural network with rely nonlinearity.", arXiv preprint arXiv:1511.03771, 2015
https://arxiv.org/abs/1511.03771
30.Pascanu, Razan, Tomas Mikolov, and Yoshua Bengio:"On the difficulty of training recurrent neural networks.", International Conference on Machine Learning, 2013
http://proceedings.mlr.press/v28/pascanu13.pdf
31.colah's blog:"Understanding LSTM Networks",2015,
http://colah.github.io/posts/2015-08-Understanding-LSTMs/
32.Chung, Junyoung, et al:"Empirical evaluation of gated recurrent neural networks on sequence modeling." arXiv preprint arXiv:1412.3355, 2014
https://arxiv.org/abs/1412.3555
33. Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever:"An empirical exploration of recurrent network architectures." International Conference on Machine Learning, 2015
http://proceedings.mlr.press/v37/jozefowicz15.pdf
誤植：Jozefowicz, Rafal, Wojciech Zaremba, and Ilya Sutskever
原情報：Rafal Jozefowicz , Wojciech Zaremba , Ilya Sutskever

RNNによる言語モデル

34.Merity, Stephen, Nitish Shirish Keskar, and Richard Socher:"Regularizing and optimizing LSTM language models." arXiv preprint arXiv:1708.02182, 2017
誤植：Merity, Stephen, Nitish Shirish Keskar, and Richard Socher
原情報：Stephen Merity, Nitish Shirish Keskar, Richard Socher
https://arxiv.org/abs/1708.02182
35.Zaremba, Wojciech, IIya Sutskever, and Oriol Vinyals:"Recurrent neural netwok regularization." arXiv preprint arXiv:1409.2329, 2014,
誤植：Zaremba, Wojciech, IIya Sutskever, and Oriol Vinyals
原情報：Wojciech Zaremba, Ilya Sutskever, Oriol Vinyals
36.Gal, Yarin, and Zoubin Ghahramani:"A theoretically grounded application of dropout in recurrent neural networks.", Advances in neural information processing systems, 2016
https://arxiv.org/abs/1512.05287
誤植：Gal, Yarin, and Zoubin Ghahramani
原情報：Yarin Gal, Zoubin Ghahramani
37.Press, Ofir, and Lior Wolf:"Using the output embedding to improve language models." arXiv preprint arXiv:1608.05859, 2016
https://arxiv.org/abs/1608.05859
誤植：Press, Ofir, and Lior Wolf
原情報：Ofir Press, Lior Wolf
38 Inan, Hakan, Khashayar Khosravi, and Richard Socher:"Tying Word Vectors and Word Classifiers: A Loss Framework for Language Modeling." arXiv preprint arXiv:1611.01462, 2016
https://arxiv.org/abs/1611.01462
「誤植：Inan, Hakan, Khashayar Khosravi, and Richard Socher
原情報：Hakan Inan, Khashayar Khosravi, Richard Socher」
39. PyTorch Examples, "Word-level language modeling RNN"
http://github.com/pytorchexamples/tree/0.3/word_language_model
「0.4が出ている。またmasterはこちらでo.3は”This branch is 15 commits behind master.”。
https://github.com/pytorch/examples/tree/master/word_language_model」

seqwseq関連

40.Keras examples, "Implementation of sequence to sequence learning for performing addition of two numbers (as string)"
https://github.com/keras-team/keras/blob/2.0.0/examples/addtion_rnn.py
「2.0.0はNot found
https://github.com/keras-team/keras/blob/master/examples/addition_rnn.py」
41.Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le:"Sequence to sequence learning with neural networks.", Advances in neural information processing systems. 2014.
「誤植：Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le
原情報： Ilya Sutskever , Oriol Vinyals , Quoc V. Le」
42.Cho, Kyunghyun, et al:"Learning phrase representations using RNN encoder-decoder for statistical machine translation.", arXiv preprint arXiv:1406.1078
https://arxiv.org/abs/1406.1078
「誤植：Cho, Kyunghyun,
原情報：Kyunghyun Cho,」
43. Vinyals, Oriol, and Quoc Le:"A neural conversational model.", arXiv preprint arXiv:1506.05869, 2015
https://arxiv.org/abs/1506.05869
「誤植：Vinyals, Oriol, and Quoc Le
原情報：Oriol Vinyals, Quoc Le」
44.Zaremba, Wojciech, and Ilya Sutskever:"Learning to execute.", arXiv preprint arXiv:1410.4615, 2014
「誤植；Zaremba, Wojciech, and Ilya Sutskever
原情報；Wojciech Zaremba, Ilya Sutskever」
45. Vinyl, Oriol, et al:"Show and tell: A neural image caption generator.", Computer Vision and Pattern Recognition(CVPR), 2015 IEEE Conference on. IEEE, 2015
https://arxiv.org/pdf/1411.4555.pdf
「誤植；Vinyl, Oriol,
原情報；Oriol Vinyals」
46. Karpathy, Andrej and Li Fei-Fei:"Deep visual-semantic alignments for generating image descriptions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015
https://cs.stanford.edu/people/karpathy/cvpr2015.pdf
https://arxiv.org/abs/1412.2306
「誤植：Karpathy, Andrej and Li Fei-Fei
原情報：Andrej Karpathy , Li Fei-Fei」
47. Show and Tell: A neural Image Caption Generator
https://github.com/tensorflow/models/tree/master/research/im2txt

Attention関連

Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio:"Neural machine translation by jointly learning to align and translate.", arXiv preprint arXiv:1409.0473, 2014
https://arxiv.org/abs/1409.0473
Bahdanau, Dzmitry, Kyunghyun Cho, and Yoshua Bengio
Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio
49.Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning:"Effective approaches to attention-based neural machine translation.", arXiv prelprint arXiv:1508.04025, 2016
「誤植:prelprint
正:preprint
誤植：Luong, Minh-Thang, Hieu Pham, and Christopher D. Manning
原情報：Minh-Thang Luong, Hieu Pham, Christopher D. Manning」
Wu, Yonghui, et al:"Google's neural machine translation system: Bridging the gap between human and machine translation.", arXiv preprint arXiv:1609.08144, 2016
「誤植：Wu, Yonghui
原情報：Yonghui Wu」
51.Google Research Blog.
https://research.googleblog.com/2016/09/a-nurral-network-for-machine.html
「誤植：https://research.googleblog.com/2016/09/a-nurral-network-for-machine.html
原情報：https://ai.googleblog.com/2016/09/a-neural-network-for-machine.html」
たぶん、「Google Research Blog. Neural Network for Machine Translation, at Production Scale, https://research.googleblog.com/2016/09/」としておけば、接続が切れなかったと思われる。後の祭り。
52.Vaswani, Ashish, et al:"Attention Is All You Need.", arXiv preprint arXiv:1706.03762, 2017
https://arxiv.org/abs/1706.03762
「誤植：Vaswani, Ashish
原情報Ashish Vaswani」
53.Google Research Blog.
https://research.googleblog.com/2017/08/transformer-novel-neural-network.html
「誤植：https://research.googleblog.com/2017/08/transformer-novel-neural-network.html
原情報：https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html」
たぶん、「Transformer: A Novel Neural Network Architecture for Language Understanding, Thursday, August 31, 2017, https://research.googleblog.com/2017/08/」としてあれば変更は必要なかったかも。
54.Gehring, Jones, et al:"Convolutional Sequence to Sequence Learning.", arXiv preprint arXiv:1705.03122, 2017
https://arxiv.org/abs/1705.03122
「誤植：Gehring, Jones,
原情報：Jonas Gehring」
##外部メモリ付きRNN
55.Graves, Alex, Greg Wayne, and Ivo Danihelka:"Neural Turing Machines.", arXiv preprint arXiv:1410.5401, 2014
https://arxiv.org/abs/1410.5401
「誤植：Graves, Alex, Greg Wayne, and Ivo Danihelka
原情報Alex Graves, Greg Wayne, Ivo Danihelka」
56.Graves, Alex, et al:"Hybrid computing using a neural network with dynamic external memory.", Nature 538.7626, 206, p471
abstract: https://www.nature.com/articles/nature20101
DeepMind Blog:"Differentiable neural computers",
https://deepmind.com/blog/differentiable-neural-computers/

「ゼロから作るDeepLearning2 自然言語処理編」参考文献の参考文献(作業中)

参考文献

python関連

ディープラーニング登場以前の自然言語処理

カウントベース手法による単語ベクトル

word2vec関連

RNN関連

RNNによる言語モデル

seqwseq関連

Attention関連

関連資料

自己記事一覧

文書履歴(document history)

最後までおよみいただきありがとうございました。

Thank you very much for reading to the last sentence.