0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

LLM memo 2024100802 AI(26)

Last updated at Posted at 2024-10-08

松尾研 LLM コミュニティ "Paper & Hacks Vol.20"
https://matsuolab-community.connpass.com/event/332695/

スケーリング法則が何故 スケーラブルなのか?
余振軒 立教大学 人工知能科学研究科

Reference


[1]

Elliot Paquette+. 4+3 Phases of Compute-Optimal Neural Scaling Laws. 2024. In arXiv:2405.15074v1
 https://arxiv.org/abs/2405.15074v1

https://arxiv.org/pdf/2405.15074v1

References on [1]

[1] K. B. Athreya and P. E. Ney. Branching processes. Reprint of the 1972 original [Springer, New York; MR0373040]. Dover Publications, Inc., Mineola, NY, 2004.
[2] Y. Bahri et al. “Explaining neural scaling laws”. In: arXiv preprint arXiv:2102.06701 (2021). https://arxiv.org/pdf/2102.06701
[3] Tamay Besiroglu et al. “Chinchilla Scaling: A replication attempt”. In: arXiv preprint arXiv:2404.10102 (2024).https://arxiv.org/pdf/
[4] B. Bordelon, A. Atanasov, and C. Pehlevan. “A Dynamical Model of Neural Scaling Laws”. In: arXivpreprint arXiv:2402.01092 (2024).https://arxiv.org/pdf/
[5] D. Cruz-Uribe and C. J. Neugebauer. “An Elementary Proof of Error Estimates for the Trapezoidal Rule”. In: Math. Mag. 76.4 (2003), pp. 303–306. issn: 0025-570X,1930-0980. url: http://www.jstor. org/stable/3219088?origin=pubexport.
[6] G. Gripenberg. “On the resolvents of nonconvolution Volterra kernels”. In: Funkcial. Ekvac. 23.1 (1980), pp. 83–95.
[7] J. Hoffmann et al. “An empirical analysis of compute-optimal large language model training”. In: Ad- vances in Neural Information Processing Systems. Vol. 35. 2022. url: https://proceedings.neurips. cc / paper _ files / paper / 2022 / file / c1e2faff6f588870935f114ebe04a3e5 - Paper - Conference . pdf.
[8] J. Kaplan et al. “Scaling laws for neural language models”. In: arXiv preprint arXiv:2001.08361 (2020).https://arxiv.org/pdf/
[9] A. Maloney, D. Roberts, and J. Sully. “A Solvable Model of Neural Scaling Laws”. In: arXiv preprint arXiv:2210.16859 (2024).https://arxiv.org/pdf/
[10] C. Paquette et al. “SGD in the Large: Average-case Analysis, Asymptotics, and Stepsize Criticality”. In:Proceedings of Thirty Fourth Conference on Learning Theory (COLT). Vol. 134. 2021, pp. 3548–3626.
[11] E. Paquette et al. “Homogenization of SGD in high-dimensions: exact dynamics and generalization properties”. In: arXiv preprint arXiv:2205.07069 (2022).https://arxiv.org/pdf/
[12] U. Sharma and J. Kaplan. “A neural scaling law from the dimension of the data manifold”. In: arXiv preprint arXiv:2004.10802 (2020).https://arxiv.org/pdf/
[13] J. B. Simon et al. “More is better in modern machine learning: when infinite overparameterization is optimal and overfitting is obligatory”. In: arXiv preprint arXiv:2311.14646 (2023).https://arxiv.org/pdf/

[2]

Tatsunori Hashimoto+, CS336: Language Modeling from Scratch, Lecture 9,

https://stanford-cs336.github.io/spring2024/

[3]

Albert Gu+. Mamba: Linear-Time Sequence Modeling with Selective State Spaces. 2024. In arXiv:2312.00752v2
 
https://arxiv.org/abs/2312.00752v2

https://arxiv.org/pdf/2312.00752v2

References on [3]

[1] Martin Arjovsky, Amar Shah, and Yoshua Bengio. “Unitary Evolution Recurrent Neural Networks”. In: The Interna- tional Conference on Machine Learning (ICML). 2016, pp. 1120–1128.

[2] Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R Ledsam, Agnieszka Grabska-Barwinska, Kyle R Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R Kelley. “Effective Gene Expression Prediction from Sequence by Integrating Long-range Interactions”. In: Nature Methods 18.10 (2021), pp. 1196–1203.
[3] Jimmy Ba, Geoffrey E Hinton, Volodymyr Mnih, Joel Z Leibo, and Catalin Ionescu. “Using Fast Weights to Attend to the Recent Past”. In: Advances in Neural Information Processing Systems (NeurIPS) 29 (2016).
[4] Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. “Layer Normalization”. In: arXiv preprint arXiv:1607.06450 (2016).https://arxiv.org/pdf/
[5] Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. “Neural Machine Translation by Jointly Learning to Align and Translate”. In: The International Conference on Learning Representations (ICLR). 2015.
[6] David Balduzzi and Muhammad Ghifary. “Strongly-typed Recurrent Neural Networks”. In: International Conference on Machine Learning. PMLR. 2016, pp. 1292–1300.
[7] Stella Biderman, Hailey Schoelkopf, Quentin Gregory Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Mohammad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, et al. “Pythia: A Suite for Analyzing Large Language Models across Training and Scaling”. In: The International Conference on Machine Learning (ICML). PMLR. 2023, pp. 2397–2430.
[8] Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. “PIQA: Reasoning about Physical Commonsense in Natural Language”. In: Proceedings of the AAAI conference on Artificial Intelligence. Vol. 34. 2020.
[9] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, et al. “Gpt-NeoX-20B: An Open-source Autoregressive Language Model”. In: arXiv preprint arXiv:2204.06745 (2022).https://arxiv.org/pdf/
[10] Guy E Blelloch. “Prefix Sums and Their Applications”. In: (1990).
[11] James Bradbury, Stephen Merity, Caiming Xiong, and Richard Socher. “Quasi-recurrent Neural Networks”. In:
arXiv preprint arXiv:1611.01576 (2016).
[12] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. “Language Models are Few-shot Learners”. In: Advances in Neural Information Processing Systems (NeurIPS) 33 (2020), pp. 1877–1901.
[13] Aydar Bulatov, Yuri Kuratov, and Mikhail S Burtsev. “Scaling Transformer to 1M tokens and Beyond with RMT”. In: arXiv preprint arXiv:2304.11062 (2023).https://arxiv.org/pdf/
[14] Rewon Child, Scott Gray, Alec Radford, and Ilya Sutskever. “Generating Long Sequences with Sparse Transformers”. In: arXiv preprint arXiv:1904.10509 (2019).https://arxiv.org/pdf/
[15] Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, Andreea Gane, Tamas Sarlos, Peter Hawkins, Jared Davis, Afroz Mohiuddin, Lukasz Kaiser, et al. “Rethinking Attention with Performers”. In: The International Conference on Learning Representations (ICLR). 2021.
[16] Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. “PaLM: Scaling Language Modeling with Pathways”. In: Journal of Machine Learning Research 24.240 (2023), pp. 1–113. url: http://jmlr.org/papers/v24/22- 1144.html.
[17] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling”. In: arXiv preprint arXiv:1412.3555 (2014).https://arxiv.org/pdf/
[18] Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. “Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge”. In: arXiv preprint arXiv:1803.05457 (2018).https://arxiv.org/pdf/
[19] Tri Dao. “FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning”. In: The International Conference on Learning Representations (ICLR). 2024.
[20] Tri Dao, Daniel Y Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. “FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness”. In: Advances in Neural Information Processing Systems (NeurIPS). 2022.
[21] Tri Dao, Daniel Y Fu, Khaled K Saab, Armin W Thomas, Atri Rudra, and Christopher Ré. “Hungry Hungry Hippos: Towards Language Modeling with State Space Models”. In: The International Conference on Learning Representations (ICLR). 2023.
[22] Yann N Dauphin, Angela Fan, Michael Auli, and David Grangier. “Language Modeling with Gated Convolutional Networks”. In: The International Conference on Machine Learning (ICML). PMLR. 2017, pp. 933–941.
[23] DeepSound. SampleRNN. https://github.com/deepsound-project/samplernn-pytorch. 2017.
[24] Jiayu Ding, Shuming Ma, Li Dong, Xingxing Zhang, Shaohan Huang, Wenhui Wang, and Furu Wei. “LongNet: Scaling Transformers to 1,000,000,000 Tokens”. In: arXiv preprint arXiv:2307.02486 (2023). https://arxiv.org/pdf/

[25] Chris Donahue, Julian McAuley, and Miller Puckette. “Adversarial Audio Synthesis”. In: The International Conference on Learning Representations (ICLR). 2019.
[26] Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. “An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale”. In: The International Conference on Learning Representations (ICLR). 2020.
[27] Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. “A Mathematical Framework for Transformer Circuits”. In: Transformer Circuits Thread (2021). https://transformer-circuits.pub/2021/framework/index.html.
[28] Mahan Fathi, Jonathan Pilault, Pierre-Luc Bacon, Christopher Pal, Orhan Firat, and Ross Goroshin. “Block-State Transformer”. In: arXiv preprint arXiv:2306.09539 (2023).https://arxiv.org/pdf/
[29] Yassir Fathullah, Chunyang Wu, Yuan Shangguan, Junteng Jia, Wenhan Xiong, Jay Mahadeokar, Chunxi Liu, Yangyang Shi, Ozlem Kalinli, Mike Seltzer, and Mark J. F. Gales. “Multi-Head State Space Model for Speech Recognition”. In: Proc. INTERSPEECH 2023. 2023, pp. 241–245. doi: 10.21437/Interspeech.2023-1036.
[30] Karl J Friston, Lee Harrison, and Will Penny. “Dynamic Causal Modelling”. In: Neuroimage 19.4 (2003), pp. 1273– 1302.
[31] Daniel Y Fu, Elliot L Epstein, Eric Nguyen, Armin W Thomas, Michael Zhang, Tri Dao, Atri Rudra, and Christopher Ré. “Simple Hardware-efficient Long Convolutions for Sequence Modeling”. In: The International Conference on Machine Learning (ICML) (2023).
[32] Ken-ichi Funahashi and Yuichi Nakamura. “Approximation of Dynamical Systems by Continuous Time Recurrent Neural Networks”. In: Neural Networks 6.6 (1993), pp. 801–806.
[33] Leo Gao, Stella Biderman, Sid Black, Laurence Golding, Travis Hoppe, Charles Foster, Jason Phang, Horace He, Anish Thite, Noa Nabeshima, Shawn Presser, and Connor Leahy. “The Pile: An 800GB Dataset of Diverse Text for Language Modeling”. In: arXiv preprint arXiv:2101.00027 (2020).https://arxiv.org/pdf/
[34] Leo Gao, Jonathan Tow, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Kyle McDonell, Niklas Muennighoff, Jason Phang, Laria Reynolds, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. A Framework for Few-shot Language Model Evaluation. Version v0.0.1. Sept. 2021. doi: 10.5281/zenodo.5371628. url: https://doi.org/10.5281/zenodo.5371628.
[35] Karan Goel, Albert Gu, Chris Donahue, and Christopher Ré. “It’s Raw! Audio Generation with State-Space Models”. In: The International Conference on Machine Learning (ICML). 2022.
[36] Albert Gu, Tri Dao, Stefano Ermon, Atri Rudra, and Christopher Ré. “HIPPO: Recurrent Memory with Optimal Polynomial Projections”. In: Advances in Neural Information Processing Systems (NeurIPS). 2020.
[37] Albert Gu, Karan Goel, and Christopher Ré. “Efficiently Modeling Long Sequences with Structured State Spaces”. In: The International Conference on Learning Representations (ICLR). 2022.
[38] Albert Gu, Caglar Gulcehre, Tom Le Paine, Matt Hoffman, and Razvan Pascanu. “Improving the Gating Mechanism of Recurrent Neural Networks”. In: The International Conference on Machine Learning (ICML). 2020.
[39] Albert Gu, Ankit Gupta, Karan Goel, and Christopher Ré. “On the Parameterization and Initialization of Diagonal State Space Models”. In: Advances in Neural Information Processing Systems (NeurIPS). 2022.
[40] Albert Gu, Isys Johnson, Karan Goel, Khaled Saab, Tri Dao, Atri Rudra, and Christopher Ré. “Combining Recurrent, Convolutional, and Continuous-time Models with the Linear State Space Layer”. In: Advances in Neural Information Processing Systems (NeurIPS). 2021.
[41] Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, and Christopher Ré. “How to Train Your HIPPO: State Space Models with Generalized Basis Projections”. In: The International Conference on Learning Representations (ICLR). 2023.
[42] Ankit Gupta, Albert Gu, and Jonathan Berant. “Diagonal State Spaces are as Effective as Structured State Spaces”. In: Advances in Neural Information Processing Systems 35 (2022), pp. 22982–22994.
[43] Ankit Gupta, Harsh Mehta, and Jonathan Berant. “Simplifying and Understanding State Space Models with Diagonal Linear RNNs”. In: arXiv preprint arXiv:2212.00768 (2022).
[44] David Ha, Andrew Dai, and Quoc V. Le. “HyperNetworks”. In: The International Conference on Learning Representa- tions (ICLR). 2017.
[45] Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. “Dream to Control: Learning Behaviors by Latent Imagination”. In: The International Conference on Learning Representations (ICLR). 2020.

[46] Ramin Hasani, Mathias Lechner, Tsun-Hsuan Wang, Makram Chahine, Alexander Amini, and Daniela Rus. “Liquid Structural State-Space Models”. In: The International Conference on Learning Representations (ICLR). 2023.
[47] Mikael Henaff, Arthur Szlam, and Yann LeCun. “Recurrent Orthogonal Networks and Long-Memory Tasks”. In: The International Conference on Machine Learning (ICML). 2016.
[48] Dan Hendrycks and Kevin Gimpel. “Gaussian Error Linear Units (GELUs)”. In: arXiv preprint arXiv:1606.08415 (2016).https://arxiv.org/pdf/
[49] Sepp Hochreiter. “Untersuchungen zu dynamischen neuronalen Netzen”. In: Diploma, Technische Universität München 91.1 (1991), p. 31.
[50] Sepp Hochreiter, Yoshua Bengio, Paolo Frasconi, Jürgen Schmidhuber, et al. Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-term Dependencies. 2001.
[51] Sepp Hochreiter and Jürgen Schmidhuber. “Long Short-Term Memory”. In: Neural Computation 9.8 (1997), pp. 1735– 1780.
[52] Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. “An Empirical Analysis of Compute- Optimal Large Language Model Training”. In: Advances in Neural Information Processing Systems (NeurIPS) 35 (2022), pp. 30016–30030.
[53] Weizhe Hua, Zihang Dai, Hanxiao Liu, and Quoc Le. “Transformer Quality in Linear Time”. In: The International Conference on Machine Learning (ICML). PMLR. 2022, pp. 9099–9117.
[54] Hassan Ismail Fawaz, Germain Forestier, Jonathan Weber, Lhassane Idoumghar, and Pierre-Alain Muller. “Deep Learning for Time Series Classification: A Review”. In: Data Mining and Knowledge Discovery 33.4 (2019), pp. 917– 963.
[55] Andrei Ivanov, Nikoli Dryden, Tal Ben-Nun, Shigang Li, and Torsten Hoefler. “Data Movement is All You Need: A Case Study on Optimizing Transformers”. In: Proceedings of Machine Learning and Systems 3 (2021), pp. 711–732.
[56] Li Jing, Caglar Gulcehre, John Peurifoy, Yichen Shen, Max Tegmark, Marin Soljacic, and Yoshua Bengio. “Gated Orthogonal Recurrent Units: On Learning to Forget”. In: Neural Computation 31.4 (2019), pp. 765–783.
[57] Rudolph Emil Kalman. “A New Approach to Linear Filtering and Prediction Problems”. In: (1960).
[58] Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. “Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention”. In: International Conference on Machine Learning. PMLR. 2020, pp. 5156–5165.
[59] Shiva Kaul. “Linear Dynamical Systems as a Core Computational Primitive”. In: Advances in Neural Information Processing Systems 33 (2020), pp. 16808–16820.
[60] Zhifeng Kong, Wei Ping, Jiaji Huang, Kexin Zhao, and Bryan Catanzaro. “DiffWave: A Versatile Diffusion Model for Audio Synthesis”. In: International Conference on Learning Representations. 2021.
[61] Chrysoula Kosma, Giannis Nikolentzos, and Michalis Vazirgiannis. “Time-Parameterized Convolutional Neural Networks for Irregularly Sampled Time Series”. In: arXiv preprint arXiv:2308.03210 (2023).
[62] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. “ImageNet Classification with Deep Convolutional Neural Networks”. In: Advances in Neural Information Processing Systems (NeurIPS) 25 (2012).
[63] Tao Lei. “When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute”. In: Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021, pp. 7633–7648.
[64] Tao Lei, Yu Zhang, Sida I Wang, Hui Dai, and Yoav Artzi. “Simple Recurrent Units for Highly Parallelizable Recurrence”. In: arXiv preprint arXiv:1709.02755 (2017).
[65] Mario Lezcano-Casado and David Martínez-Rubio. “Cheap Orthogonal Constraints in Neural Networks: A Simple Parametrization of the Orthogonal and Unitary Group”. In: The International Conference on Machine Learning (ICML). 2019.
[66] Yuhong Li, Tianle Cai, Yi Zhang, Deming Chen, and Debadeepta Dey. “What Makes Convolutional Models Great on Long Sequence Modeling?” In: The International Conference on Learning Representations (ICLR). 2023.
[67] Vasileios Lioutas and Yuhong Guo. “Time-aware Large Kernel Convolutions”. In: The International Conference on Machine Learning (ICML). PMLR. 2020, pp. 6172–6183.
[68] Chris Lu, Yannick Schroecker, Albert Gu, Emilio Parisotto, Jakob Foerster, Satinder Singh, and Feryal Behbahani. “Structured State Space Models for In-Context Reinforcement Learning”. In: Advances in Neural Information Processing Systems (NeurIPS). 2023.
[69] Shahar Lutati, Itamar Zimerman, and Lior Wolf. “Focus Your Attention (with Adaptive IIR Filters)”. In: arXiv preprint arXiv:2305.14952 (2023).

[70] Xuezhe Ma, Chunting Zhou, Xiang Kong, Junxian He, Liangke Gui, Graham Neubig, Jonathan May, and Luke Zettlemoyer. “Mega: Moving Average Equipped Gated Attention”. In: The International Conference on Learning Representations (ICLR). 2023.
[71] Eric Martin and Chris Cundy. “Parallelizing Linear Recurrent Neural Nets Over Sequence Length”. In: The Interna- tional Conference on Learning Representations (ICLR). 2018.
[72] Soroush Mehri, Kundan Kumar, Ishaan Gulrajani, Rithesh Kumar, Shubham Jain, Jose Sotelo, Aaron Courville, and Yoshua Bengio. “SampleRNN: An Unconditional End-to-End Neural Audio Generation Model”. In: The International Conference on Learning Representations (ICLR). 2017.
[73] Harsh Mehta, Ankit Gupta, Ashok Cutkosky, and Behnam Neyshabur. “Long Range Language Modeling via Gated State Spaces”. In: The International Conference on Learning Representations (ICLR). 2023.
[74] Zakaria Mhammedi, Andrew Hellicar, Ashfaqur Rahman, and James Bailey. “Efficient Orthogonal Parametrisation of Recurrent Neural Networks using Householder Reflections”. In: International Conference on Machine Learning. PMLR. 2017, pp. 2401–2409.
[75] Eric Nguyen, Karan Goel, Albert Gu, Gordon Downs, Preey Shah, Tri Dao, Stephen Baccus, and Christopher Ré. “S4ND: Modeling Images and Videos as Multidimensional Signals with State Spaces”. In: Advances in Neural Information Processing Systems (NeurIPS). 2022.
[76] Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, et al. “HyenaDNA: Long-range Genomic Sequence Modeling at Single Nucleotide Resolution”. In: Advances in Neural Information Processing Systems (NeurIPS). 2023.
[77] Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. “In-context Learning and Induction Heads”. In: Transformer Circuits Thread (2022). https://transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html.
[78] Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. “WaveNet: A Generative Model for Raw Audio”. In: arXiv preprint arXiv:1609.03499 (2016).https://arxiv.org/pdf/
[79] Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, and Soham De. “Resurrecting Recurrent Neural Networks for Long Sequences”. In: The International Conference on Machine Learning (ICML). 2023.
[80] Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Ngoc-Quan Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, and Raquel Fernández. “The LAMBADA Dataset: Word Prediction Requiring a Broad Discourse Context”. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016, pp. 1525–1534.
[81] Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. “On the Difficulty of Training Recurrent Neural Networks”. In: International Conference on Machine Learning. 2013, pp. 1310–1318.
[82] Bo Peng, Eric Alcaide, Quentin Anthony, Alon Albalak, Samuel Arcadinho, Huanqi Cao, Xin Cheng, Michael Chung, Matteo Grella, Kranthi Kiran GV, et al. “RWKV: Reinventing RNNs for the Transformer Era”. In: arXiv preprint arXiv:2305.13048 (2023).
[83] Hao Peng, Nikolaos Pappas, Dani Yogatama, Roy Schwartz, Noah A Smith, and Lingpeng Kong. “Random Feature Attention”. In: The International Conference on Learning Representations (ICLR). 2021.
[84] Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher Ré. “Hyena Hierarchy: Towards Larger Convolutional Language Models”. In: The International Conference on Machine Learning (ICML). 2023.
[85] Zhen Qin, Xiaodong Han, Weixuan Sun, Bowen He, Dong Li, Dongxu Li, Yuchao Dai, Lingpeng Kong, and Yiran Zhong. “Toeplitz Neural Network for Sequence Modeling”. In: The International Conference on Learning Representations (ICLR). 2023.
[86] Zhen Qin, Xiaodong Han, Weixuan Sun, Dongxu Li, Lingpeng Kong, Nick Barnes, and Yiran Zhong. “The devil in linear transformer”. In: arXiv preprint arXiv:2210.10340 (2022).
[87] Zhen Qin, Weixuan Sun, Hui Deng, Dongxu Li, Yunshen Wei, Baohong Lv, Junjie Yan, Lingpeng Kong, and Yiran Zhong. “CosFormer: Rethinking Softmax in Attention”. In: The International Conference on Learning Representations (ICLR). 2022.
[88] Ali Rahimi and Benjamin Recht. “Random Features for Large-Scale Kernel Machines”. In: Advances in Neural Information Processing Systems (NeurIPS) 20 (2007).

[89] Prajit Ramachandran, Barret Zoph, and Quoc V Le. “Swish: A Self-gated Activation Function”. In: arXiv preprint arXiv:1710.05941 7.1 (2017), p. 5.
[90] David W Romero, Anna Kuzina, Erik J Bekkers, Jakub M Tomczak, and Mark Hoogendoorn. “CKConv: Continuous Kernel Convolution For Sequential Data”. In: arXiv preprint arXiv:2102.02611 (2021).https://arxiv.org/pdf/
[91] Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. “Winogrande: An Adversarial Winograd Schema Challenge at Scale”. In: Communications of the ACM 64.9 (2021), pp. 99–106.
[92] George Saon, Ankit Gupta, and Xiaodong Cui. “Diagonal State Space Augmented Transformers for Speech Recogni- tion”. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. 2023, pp. 1–5.
[93] Imanol Schlag, Kazuki Irie, and Jürgen Schmidhuber. “Linear Transformers are Secretly Fast Weight Programmers”. In: The International Conference on Machine Learning (ICML). PMLR. 2021, pp. 9355–9366.
[94] Jürgen Schmidhuber. “Learning to control fast-weight memories: An alternative to dynamic recurrent networks”. In: Neural Computation 4.1 (1992), pp. 131–139.
[95] Noam Shazeer. “GLU Variants Improve Transformer”. In: arXiv preprint arXiv:2002.05202 (2020). https://arxiv.org/pdf/
[96] Freda Shi, Xinyun Chen, Kanishka Misra, Nathan Scales, David Dohan, Ed H Chi, Nathanael Schärli, and Denny Zhou. “Large Language Models can be Easily Distracted by Irrelevant Context”. In: The International Conference on Machine Learning (ICML). PMLR. 2023, pp. 31210–31227.
[97] Jiaxin Shi, Ke Alexander Wang, and Emily Fox. “Sequence Modeling with Multiresolution Convolutional Memory”. In: The International Conference on Machine Learning (ICML). PMLR. 2023, pp. 31312–31327.
[98] Jimmy TH Smith, Andrew Warrington, and Scott W Linderman. “Simplified State Space Layers for Sequence Modeling”. In: The International Conference on Learning Representations (ICLR). 2023.
[99] Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. “Roformer: Enhanced Transformer with Rotary Position Embedding”. In: arXiv preprint arXiv:2104.09864 (2021).
[100] Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, Jianyong Wang, and Furu Wei. “Retentive network: A successor to transformer for large language models”. In: arXiv preprint arXiv:2307.08621 (2023).
[101] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. “Sequence to Sequence Learning with Neural Networks”. In: Advances in Neural Information Processing Systems (NeurIPS) 27 (2014).
[102] Corentin Tallec and Yann Ollivier. “Can Recurrent Neural Networks Warp Time?” In: The International Conference on Learning Representations (ICLR). 2018.
[103] Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, and Donald Metzler. “Long Range Arena: A Benchmark for Efficient Transformers”. In: International Conference on Learning Representations (ICLR). 2021.
[104] Yi Tay, Mostafa Dehghani, Dara Bahri, and Donald Metzler. “Efficient Transformers: A Survey”. In: ACM Computing Surveys 55.6 (2022), pp. 1–28.
[105] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. “Llama: Open and Efficient Foundation Language Models”. In: arXiv preprint arXiv:2302.13971 (2023). https://arxiv.org/pdf/
[106] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. “Attention Is All You Need”. In: Advances in Neural Information Processing Systems (NeurIPS). 2017.
[107] Eugene Vorontsov, Chiheb Trabelsi, Samuel Kadoury, and Chris Pal. “On Orthogonality and Learning Recurrent Networks with Long Term Dependencies”. In: International Conference on Machine Learning. PMLR. 2017, pp. 3570–3578.
[108] Jue Wang, Wentao Zhu, Pichao Wang, Xiang Yu, Linda Liu, Mohamed Omar, and Raffay Hamid. “Selective Structured State-Spaces for Long-form Video Understanding”. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023, pp. 6387–6397.
[109] Pete Warden. “Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition”. In: ArXiv abs/1804.03209 (2018). https://arxiv.org/pdf/
[110] Samuel Williams, Andrew Waterman, and David Patterson. “Roofline: An Insightful Visual Performance Model for Multicore Architectures”. In: Communications of the ACM 52.4 (2009), pp. 65–76.
[111] Brandon Yang, Gabriel Bender, Quoc V Le, and Jiquan Ngiam. “CondConv: Conditionally Parameterized Convolu-tions for Efficient Inference”. In: Advances in Neural Information Processing Systems (NeurIPS) 32 (2019).
[112] Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. “HellaSwag: Can a Machine Really Finish Your Sentence?” In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.
[113]
Shuangfei Zhai, Walter Talbott, Nitish Srivastava, Chen Huang, Hanlin Goh, Ruixiang Zhang, and Josh Susskind. “An Attention Free Transformer”. In: arXiv preprint arXiv:2105.14103 (2021). https://arxiv.org/pdf/
[114]
Michael Zhang, Khaled K Saab, Michael Poli, Tri Dao, Karan Goel, and Christopher Ré. “Effectively Modeling Time Series with Simple Discrete State Spaces”. In: The International Conference on Learning Representations (ICLR). 2023.
[115]
Lin Zheng, Chong Wang, and Lingpeng Kong. “Linear complexity randomized self-attention mechanism”. In: International Conference on Machine Learning. PMLR. 2022, pp. 27011–27041.
[116]
Simiao Zuo, Xiaodong Liu, Jian Jiao, Denis Charles, Eren Manavoglu, Tuo Zhao, and Jianfeng Gao. “Efficient Long Sequence Modeling via State Space Augmented Transformer”. In: arXiv preprint arXiv:2212.08136 (2022).https://arxiv.org/pdf/

[4]

Samy Jelassi+. Repeat After Me: Transformers are Better than State Space Models at Copying. 2024. In arXiv:2402.01032
 
https://arxiv.org/abs/2402.01032

https://arxiv.org/pdf/2402.01032

References [4]

Akyu ̈rek, E., Wang, B., Kim, Y., and Andreas, J. In-context language learning: Arhitectures and algorithms. arXiv preprint arXiv:2401.12973, 2024.

Anil, C., Wu, Y., Andreassen, A., Lewkowycz, A., Misra,V., Ramasesh, V., Slone, A., Gur-Ari, G., Dyer, E., and Neyshabur, B. Exploring length generalization in large language models. Advances in Neural Information Pro- cessing Systems, 35:38546–38556, 2022.
Biderman, S., Schoelkopf, H., Anthony, Q. G., Bradley, H., O’Brien, K., Hallahan, E., Khan, M. A., Purohit, S., Prashanth, U. S., Raff, E., et al. Pythia: A suite for ana- lyzing large language models across training and scaling. In International Conference on Machine Learning, pp. 2397–2430. PMLR, 2023.
Bradbury, J., Merity, S., Xiong, C., and Socher, R. Quasi-recurrent neural networks. arXiv preprint arXiv:1611.01576, 2016.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., et al. Language models are few-shot learners. Advances in neural information processing systems, 33: 1877–1901, 2020.
Carlini, N., Ippolito, D., Jagielski, M., Lee, K., Tramer, F., and Zhang, C. Quantifying memorization across neu- ral language models. arXiv preprint arXiv:2202.07646, 2022.
Chiang, D., Cholak, P., and Pillay, A. Tighter bounds on the expressivity of transformer encoders. arXiv preprint arXiv:2301.10743, 2023.
Choromanski, K., Likhosherstov, V., Dohan, D., Song, X., Gane, A., Sarlos, T., Hawkins, P., Davis, J., Mohiuddin, A., Kaiser, L., et al. Rethinking attention with performers. arXiv preprint arXiv:2009.14794, 2020.
Dao, T., Fu, D., Ermon, S., Rudra, A., and Re ́, C. Flashat- tention: Fast and memory-efficient exact attention with io-awareness. Advances in Neural Information Process- ing Systems, 35:16344–16359, 2022.
Dele ́tang, G., Ruoss, A., Grau-Moya, J., Genewein, T., Wen- liang, L. K., Catt, E., Hutter, M., Legg, S., and Ortega, P. A. Neural networks and the chomsky hierarchy. arXiv preprint arXiv:2207.02098, 2022.
Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. Bert: Pre-training of deep bidirectional transformers for lan- guage understanding. arXiv preprint arXiv:1810.04805, 2018.
Edelman, B. L., Goel, S., Kakade, S., and Zhang, C. Induc- tive biases and variable creation in self-attention mecha- nisms. In International Conference on Machine Learning, pp. 5793–5831. PMLR, 2022.
Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N.,

Repeat After Me: Transformers are Better than State Space Models at Copying
et al. The pile: An 800gb dataset of diverse text for language modeling. arXiv preprint arXiv:2101.00027, 2020.
Grazzi, R., Siems, J., Schrodi, S., Brox, T., and Hutter, F. Is mamba capable of in-context learning? arXiv preprint arXiv:2402.03170, 2024.
Gu, A. and Dao, T. Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752, 2023.
Gu, A., Goel, K., and Re ́, C. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021.
Hochreiter, S. and Schmidhuber, J. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
Jelassi, S., d’Ascoli, S., Domingo-Enrich, C., Wu, Y., Li, Y., and Charton, F. Length generalization in arithmetic transformers. arXiv preprint arXiv:2306.15400, 2023.
Merrill, W., Weiss, G., Goldberg, Y., Schwartz, R., Smith, N. A., and Yahav, E. A formal hierarchy of rnn architec- tures. arXiv preprint arXiv:2004.08500, 2020.
Merrill, W., Sabharwal, A., and Smith, N. A. Saturated Transformers are Constant-Depth Threshold Circuits. Transactions of the Association for Computational Lin- guistics, 10:843–856, 08 2022. ISSN 2307-387X. doi: 10.1162/tacl a 00493. URL https://doi.org/10. 1162/tacl_a_00493.
Miller, G. A. The magic number seven plus or minus two: Some limits on our capacity for processing information. Psychological review, 63:91–97, 1956.
Nguyen, E., Poli, M., Faizi, M., Thomas, A., Birch-Sykes, C., Wornow, M., Patel, A., Rabideau, C., Massaroli, S., Bengio, Y., et al. Hyenadna: Long-range genomic se- quence modeling at single nucleotide resolution. arXiv preprint arXiv:2306.15794, 2023.
Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y., Chen, A., et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
Park, J., Park, J., Xiong, Z., Lee, N., Cho, J., Oymak, S., Lee, K., and Papailiopoulos, D. Can mamba learn how to learn? a comparative study on in-context learning tasks. arXiv preprint arXiv:2402.04248, 2024.
Pascanu, R., Mikolov, T., and Bengio, Y. On the difficulty of training recurrent neural networks. In International conference on machine learning, pp. 1310–1318. Pmlr, 2013.
Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019.
Peng, B., Alcaide, E., Anthony, Q., Albalak, A., Arcadinho, S., Cao, H., Cheng, X., Chung, M., Grella, M., GV, K. K., et al. Rwkv: Reinventing rnns for the transformer era. arXiv preprint arXiv:2305.13048, 2023.
Petroni, F., Lewis, P., Piktus, A., Rockta ̈schel, T., Wu, Y., Miller, A. H., and Riedel, S. How context affects language models’ factual predictions. arXiv preprint arXiv:2005.04611, 2020.
Press, O., Smith, N. A., and Lewis, M. Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409, 2021.
Kamradt, G. Llmtest needleinahaystack.
//github.com/gkamradt/LLMTest_ NeedleInAHaystack, 2023.
https:
Katharopoulos, A., Vyas, A., Pappas, N., and Fleuret, F. Transformers are rnns: Fast autoregressive transformers with linear attention. In International conference on ma- chine learning, pp. 5156–5165. PMLR, 2020.
Kazemnejad, A., Padhi, I., Ramamurthy, K. N., Das, P., and Reddy, S. The impact of positional encoding on length generalization in transformers. arXiv preprint arXiv:2305.19466, 2023.
Liu, B., Ash, J. T., Goel, S., Krishnamurthy, A., and Zhang, C. Exposing attention glitches with flip-flop language modeling. arXiv preprint arXiv:2306.00946, 2023a.
Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilac- qua, M., Petroni, F., and Liang, P. Lost in the middle: How language models use long contexts. arXiv preprint arXiv:2307.03172, 2023b.
Loshchilov, I. and Hutter, F. Decoupled weight decay regu- larization. arXiv preprint arXiv:1711.05101, 2017.
McCoy, R. T., Smolensky, P., Linzen, T., Gao, J., and Ce- likyilmaz, A. How much do language models copy from their training data? evaluating linguistic novelty in text generation using raven. Transactions of the Association for Computational Linguistics, 11:652–670, 2023.
Merrill, W. Sequential neural networks as automata. arXiv preprint arXiv:1906.01615, 2019.

Repeat After Me: Transformers are Better than State Space Models at Copying
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., and Sutskever, I. Language models are unsu- pervised multitask learners, 2019. URL https: //api.semanticscholar.org/CorpusID: 160025533.
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020.
Rajpurkar, P., Jia, R., and Liang, P. Know what you don’t know: Unanswerable questions for squad. arXiv preprint arXiv:1806.03822, 2018.
Ruoss,A.,Dele ́tang,G.,Genewein,T.,Grau-Moya,J., Csorda ́s, R., Bennani, M., Legg, S., and Veness, J. Ran- domized positional encodings boost length generalization of transformers. arXiv preprint arXiv:2305.16843, 2023.
Sanford, C., Hsu, D., and Telgarsky, M. Representational strengths and limitations of transformers. arXiv preprint arXiv:2306.02896, 2023.
Shelton, T. The Ingenious Gentleman Don Quixote of La Mancha. 1612. Written by Miguel de Cervantes, trans- lated by Thomas Shelton.
Shen, R., Bubeck, S., Eldan, R., Lee, Y. T., Li, Y., and Zhang, Y. Positional description matters for transformers arithmetic. arXiv preprint arXiv:2311.14737, 2023.
Strobl, L., Merrill, W., Weiss, G., Chiang, D., and Angluin, D. Transformers as recognizers of formal languages: A survey on expressivity. arXiv preprint arXiv:2311.00208, 2023.
Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., and Liu, Y. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, pp. 127063, 2023.
Sun, Y., Dong, L., Huang, S., Ma, S., Xia, Y., Xue, J., Wang, J., and Wei, F. Retentive network: A successor to transformer for large language models. arXiv preprint arXiv:2307.08621, 2023.
Tikochinski, R., Goldstein, A., Meiri, Y., Hasson, U., and Reichart, R. An incremental large language model for long text processing in the brain. 2024.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., and Polosukhin, I. At- tention is all you need. Advances in neural information processing systems, 30, 2017.
Wei, C., Chen, Y., and Ma, T. Statistically meaningful approximation: a case study on approximating turing
machineswithtransformers.AdvancesinNeuralInfor- mation Processing Systems, 35:12071–12083, 2022.
Weiss, G., Goldberg, Y., and Yahav, E. Thinking like trans- formers. In International Conference on Machine Learn- ing, pp. 11080–11090. PMLR, 2021.
Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., et al. Huggingface’s transformers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
Zhou, H., Bradley, A., Littwin, E., Razin, N., Saremi, O., Susskind, J., Bengio, S., and Nakkiran, P. What algo- rithms can transformers learn? a study in length general- ization. arXiv preprint arXiv:2310.16028, 2023.

<この項は書きかけです。順次追記します。>
This article is not completed. I will add some words and/or centences in order.

Qiita Calendar 2024

2024 参加・主催Calendarと投稿記事一覧 Qiita(248)
https://qiita.com/kaizen_nagoya/items/d80b8fbac2496df7827f

主催Calendar2024分析 Qiita(254)
https://qiita.com/kaizen_nagoya/items/15807336d583076f70bc

博士論文 Calendar 2024 を開催します。
https://qiita.com/kaizen_nagoya/items/51601357efbcaf1057d0

博士論文(0)関連記事一覧
https://qiita.com/kaizen_nagoya/items/8f223a760e607b705e78

自己記事一覧

Qiitaで逆リンクを表示しなくなったような気がする。時々、スマフォで表示するとあらわっることがあり、完全に削除したのではなさそう。

4月以降、せっせとリンクリストを作り、統計を取って確率を説明しようとしている。
2025年2月末を目標にしている。

一覧の一覧( The directory of directories of mine.) Qiita(100)
https://qiita.com/kaizen_nagoya/items/7eb0e006543886138f39

仮説(0)一覧(目標100現在40)
https://qiita.com/kaizen_nagoya/items/f000506fe1837b3590df

Qiita(0)Qiita関連記事一覧(自分)
https://qiita.com/kaizen_nagoya/items/58db5fbf036b28e9dfa6

Error一覧 error(0)
https://qiita.com/kaizen_nagoya/items/48b6cbc8d68eae2c42b8

C++ Support(0) 
https://qiita.com/kaizen_nagoya/items/8720d26f762369a80514

Coding(0) Rules, C, Secure, MISRA and so on
https://qiita.com/kaizen_nagoya/items/400725644a8a0e90fbb0

Ethernet 記事一覧 Ethernet(0)
https://qiita.com/kaizen_nagoya/items/88d35e99f74aefc98794

Wireshark 一覧 wireshark(0)、Ethernet(48)
https://qiita.com/kaizen_nagoya/items/fbed841f61875c4731d0

線網(Wi-Fi)空中線(antenna)(0) 記事一覧(118/300目標)
https://qiita.com/kaizen_nagoya/items/5e5464ac2b24bd4cd001

なぜdockerで機械学習するか 書籍・ソース一覧作成中 (目標100)
https://qiita.com/kaizen_nagoya/items/ddd12477544bf5ba85e2

プログラムちょい替え(0)一覧:4件
https://qiita.com/kaizen_nagoya/items/296d87ef4bfd516bc394

言語処理100本ノックをdockerで。python覚えるのに最適。:10+12
https://qiita.com/kaizen_nagoya/items/7e7eb7c543e0c18438c4

Python(0)記事をまとめたい。
https://qiita.com/kaizen_nagoya/items/088c57d70ab6904ebb53

安全(0)安全工学シンポジウムに向けて: 21
https://qiita.com/kaizen_nagoya/items/c5d78f3def8195cb2409

プログラマによる、プログラマのための、統計(0)と確率のプログラミングとその後
https://qiita.com/kaizen_nagoya/items/6e9897eb641268766909

転職(0)一覧
https://qiita.com/kaizen_nagoya/items/f77520d378d33451d6fe

技術士(0)一覧
https://qiita.com/kaizen_nagoya/items/ce4ccf4eb9c5600b89ea

Reserchmap(0) 一覧
https://qiita.com/kaizen_nagoya/items/506c79e562f406c4257e

物理記事 上位100
https://qiita.com/kaizen_nagoya/items/66e90fe31fbe3facc6ff

量子(0) 計算機, 量子力学
https://qiita.com/kaizen_nagoya/items/1cd954cb0eed92879fd4

数学関連記事100
https://qiita.com/kaizen_nagoya/items/d8dadb49a6397e854c6d

coq(0) 一覧
https://qiita.com/kaizen_nagoya/items/d22f9995cf2173bc3b13

統計(0)一覧
https://qiita.com/kaizen_nagoya/items/80d3b221807e53e88aba

図(0) state, sequence and timing. UML and お絵描き
https://qiita.com/kaizen_nagoya/items/60440a882146aeee9e8f

色(0) 記事100書く切り口
https://qiita.com/kaizen_nagoya/items/22331c0335ed34326b9b

品質一覧
https://qiita.com/kaizen_nagoya/items/2b99b8e9db6d94b2e971

言語・文学記事 100
https://qiita.com/kaizen_nagoya/items/42d58d5ef7fb53c407d6

医工連携関連記事一覧
https://qiita.com/kaizen_nagoya/items/6ab51c12ba51bc260a82

水の資料集(0) 方針と成果
https://qiita.com/kaizen_nagoya/items/f5dbb30087ea732b52aa

自動車 記事 100
https://qiita.com/kaizen_nagoya/items/f7f0b9ab36569ad409c5

通信記事100
https://qiita.com/kaizen_nagoya/items/1d67de5e1cd207b05ef7

日本語(0)一欄
https://qiita.com/kaizen_nagoya/items/7498dcfa3a9ba7fd1e68

英語(0) 一覧
https://qiita.com/kaizen_nagoya/items/680e3f5cbf9430486c7d

音楽 一覧(0)
https://qiita.com/kaizen_nagoya/items/b6e5f42bbfe3bbe40f5d

@kazuo_reve 新人の方によく展開している有益な情報」確認一覧
https://qiita.com/kaizen_nagoya/items/b9380888d1e5a042646b

鉄道(0)鉄道のシステム考察はてっちゃんがてつだってくれる
https://qiita.com/kaizen_nagoya/items/faa4ea03d91d901a618a

OSEK OS設計の基礎 OSEK(100)
https://qiita.com/kaizen_nagoya/items/7528a22a14242d2d58a3

coding (101) 一覧を作成し始めた。omake:最近のQiitaで表示しない5つの事象
https://qiita.com/kaizen_nagoya/items/20667f09f19598aedb68

官公庁・学校・公的団体(NPOを含む)システムの課題、官(0)
https://qiita.com/kaizen_nagoya/items/04ee6eaf7ec13d3af4c3

「はじめての」シリーズ  ベクタージャパン 
https://qiita.com/kaizen_nagoya/items/2e41634f6e21a3cf74eb

AUTOSAR(0)Qiita記事一覧, OSEK(75)
https://qiita.com/kaizen_nagoya/items/89c07961b59a8754c869

プログラマが知っていると良い「公序良俗」
https://qiita.com/kaizen_nagoya/items/9fe7c0dfac2fbd77a945

LaTeX(0) 一覧 
https://qiita.com/kaizen_nagoya/items/e3f7dafacab58c499792

自動制御、制御工学一覧(0)
https://qiita.com/kaizen_nagoya/items/7767a4e19a6ae1479e6b

Rust(0) 一覧 
https://qiita.com/kaizen_nagoya/items/5e8bb080ba6ca0281927

関連資料

' @kazuo_reve 私が効果を確認した「小川メソッド」
https://qiita.com/kazuo_reve/items/a3ea1d9171deeccc04da

' @kazuo_reve 新人の方によく展開している有益な情報
https://qiita.com/kazuo_reve/items/d1a3f0ee48e24bba38f1

' @kazuo_reve Vモデルについて勘違いしていたと思ったこと
https://qiita.com/kazuo_reve/items/46fddb094563bd9b2e1e

Engineering Festa 2024前に必読記事一覧

programの本質は計画だ。programは設計だ。
https://qiita.com/kaizen_nagoya/items/c8545a769c246a458c27

登壇直後版 色使い(JIS安全色) Qiita Engineer Festa 2023〜私しか得しないニッチな技術でLT〜 スライド編 0.15
https://qiita.com/kaizen_nagoya/items/f0d3070d839f4f735b2b

プログラマが知っていると良い「公序良俗」
https://qiita.com/kaizen_nagoya/items/9fe7c0dfac2fbd77a945

逆も真:社会人が最初に確かめるとよいこと。OSEK(69)、Ethernet(59)
https://qiita.com/kaizen_nagoya/items/39afe4a728a31b903ddc

統計の嘘。仮説(127)
https://qiita.com/kaizen_nagoya/items/63b48ecf258a3471c51b

自分の言葉だけで論理展開できるのが天才なら、文章の引用だけで論理展開できるのが秀才だ。仮説(136)
https://qiita.com/kaizen_nagoya/items/97cf07b9e24f860624dd

参考文献駆動執筆(references driven writing)・デンソークリエイト編
https://qiita.com/kaizen_nagoya/items/b27b3f58b8bf265a5cd1

「何を」よりも「誰を」。10年後のために今見習いたい人たち
https://qiita.com/kaizen_nagoya/items/8045978b16eb49d572b2

Qiitaの記事に3段階または5段階で到達するための方法
https://qiita.com/kaizen_nagoya/items/6e9298296852325adc5e

出力(output)と呼ばないで。これは状態(state)です。
https://qiita.com/kaizen_nagoya/items/80b8b5913b2748867840

coding (101) 一覧を作成し始めた。omake:最近のQiitaで表示しない5つの事象
https://qiita.com/kaizen_nagoya/items/20667f09f19598aedb68

あなたは「勘違いまとめ」から、勘違いだと言っていることが勘違いだといくつ見つけられますか。人間の間違い(human error(125))の種類と対策
https://qiita.com/kaizen_nagoya/items/ae391b77fffb098b8fb4

プログラマの「プログラムが書ける」思い込みは強みだ。3つの理由。仮説(168)統計と確率(17) , OSEK(79)
https://qiita.com/kaizen_nagoya/items/bc5dd86e414de402ec29

出力(output)と呼ばないで。これは状態(state)です。
https://qiita.com/kaizen_nagoya/items/80b8b5913b2748867840

これからの情報伝達手段の在り方について考えてみよう。炎上と便乗。
https://qiita.com/kaizen_nagoya/items/71a09077ac195214f0db

ISO/IEC JTC1 SC7 Software and System Engineering
https://qiita.com/kaizen_nagoya/items/48b43f0f6976a078d907

アクセシビリティの知見を発信しよう!(再び)
https://qiita.com/kaizen_nagoya/items/03457eb9ee74105ee618

統計論及確率論輪講(再び)
https://qiita.com/kaizen_nagoya/items/590874ccfca988e85ea3

読者の心をグッと惹き寄せる7つの魔法
https://qiita.com/kaizen_nagoya/items/b1b5e89bd5c0a211d862

@kazuo_reve 新人の方によく展開している有益な情報」確認一覧
https://qiita.com/kaizen_nagoya/items/b9380888d1e5a042646b

ソースコードで議論しよう。日本語で議論するの止めましょう(あるプログラミング技術の議論報告)
https://qiita.com/kaizen_nagoya/items/8b9811c80f3338c6c0b0

脳内コンパイラの3つの危険
https://qiita.com/kaizen_nagoya/items/7025cf2d7bd9f276e382

心理学の本を読むよりはコンパイラ書いた方がよくね。仮説(34)
https://qiita.com/kaizen_nagoya/items/fa715732cc148e48880e

NASAを超えるつもりがあれば読んでください。
https://qiita.com/kaizen_nagoya/items/e81669f9cb53109157f6

データサイエンティストの気づき!「勉強して仕事に役立てない人。大嫌い!!」『それ自分かも?』ってなった!!!
https://qiita.com/kaizen_nagoya/items/d85830d58d8dd7f71d07

「ぼくの好きな先生」「人がやらないことをやれ」プログラマになるまで。仮説(37) 
https://qiita.com/kaizen_nagoya/items/53e4bded9fe5f724b3c4

なぜ経済学徒を辞め、計算機屋になったか(経済学部入学前・入学後・卒業後対応) 転職(1)
https://qiita.com/kaizen_nagoya/items/06335a1d24c099733f64

プログラミング言語教育のXYZ。 仮説(52)
https://qiita.com/kaizen_nagoya/items/1950c5810fb5c0b07be4

【24卒向け】9ヶ月後に年収1000万円を目指す。二つの関門と三つの道。
https://qiita.com/kaizen_nagoya/items/fb5bff147193f726ad25

「【25卒向け】Qiita Career Meetup for STUDENT」予習の勧め
https://qiita.com/kaizen_nagoya/items/00eadb8a6e738cb6336f

大学入試不合格でも筆記試験のない大学に入って卒業できる。卒業しなくても博士になれる。
https://qiita.com/kaizen_nagoya/items/74adec99f396d64b5fd5

全世界の不登校の子供たち「博士論文」を書こう。世界子供博士論文遠隔実践中心 安全(99)
https://qiita.com/kaizen_nagoya/items/912d69032c012bcc84f2

小川メソッド 覚え(書きかけ)
https://qiita.com/kaizen_nagoya/items/3593d72eca551742df68

DoCAP(ドゥーキャップ)って何ですか?
https://qiita.com/kaizen_nagoya/items/47e0e6509ab792c43327

views 20,000越え自己記事一覧
https://qiita.com/kaizen_nagoya/items/58e8bd6450957cdecd81

Views1万越え、もうすぐ1万記事一覧 最近いいねをいただいた213記事
https://qiita.com/kaizen_nagoya/items/d2b805717a92459ce853

amazon 殿堂入りNo1レビュアになるまで。仮説(102)
https://qiita.com/kaizen_nagoya/items/83259d18921ce75a91f4

100以上いいねをいただいた記事16選
https://qiita.com/kaizen_nagoya/items/f8d958d9084ffbd15d2a

小川清最終講義、最終講義(再)計画, Ethernet(100) 英語(100) 安全(100)
https://qiita.com/kaizen_nagoya/items/e2df642e3951e35e6a53

<この記事は個人の過去の経験に基づく個人の感想です。現在所属する組織、業務とは関係がありません。>
This article is an individual impression based on my individual experience. It has nothing to do with the organization or business to which I currently belong.

文書履歴(document history)

ver. 0.01 初稿  20241022

最後までおよみいただきありがとうございました。

いいね 💚、フォローをお願いします。

Thank you very much for reading to the last sentence.

Please press the like icon 💚 and follow me for your happy life.

0
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?