1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

LLMはハルシネーションを自覚しているか 松尾研 LLM コミュニティ "Paper & Hacks Vol.24" AI(6)

Last updated at Posted at 2024-11-10

LLM(Large Language Model) Advent Calendar 2024
https://qiita.com/advent-calendar/2024/llm
21日投稿予定の記事です。

松尾研 LLM コミュニティ "Paper & Hacks Vol.24"
https://matsuolab-community.connpass.com/event/335384/

発表者: 松尾研LLMコミュニティメンバー 下村 晃生(九州工業大学工学部宇宙システム工学科/松尾研大規模言語モデル講座第9回担当)
テーマ: LLMはハルシネーションを自覚しているか
論文リンク:Physics of Language Models: Part2.2 How to Learn From Mistakes on Grade-School Math Problems
https://arxiv.org/abs/2408.16293

自然言語処理をdocker(186)で
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487

参考文献の参考文献を一覧とするとともに単語帳を作る。(途中)
単語帳は、2回出現する単語までをここに記載している。1度だけの単語はDocker上にある。

<この項は書きかけです。順次追記します。>
This article is not completed. I will add some words and/or centences in order.

Reference

[1] Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra. Transformers learn to implement preconditioned gradient descent for in-context learning. Advances in Neural Information Processing Systems, 36, 2024.
https://arxiv.org/abs/2306.00297
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 1, Learning Hierarchical Language Structures. ArXiv e-prints, abs/2305.13673, May 2023. Full version available at http://arxiv.org/abs/2305.13673.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.2, Knowledge Manipulation. ArXiv e-prints, abs/2309.14402, September 2023. Full version available at http://arxiv.org/abs/2309.14402.
[4] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. In ICML, 2024. Full version available at http://arxiv.org/abs/2309.14316.
[5] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[6] Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, and Behnam Neyshabur. Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022.
[7] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[8] S ́ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
[9] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
[10] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. In ICLR, 2021.
[11] Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou. Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798, 2023.
[12] Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. Making language models better reasoners with step-aware verifier. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.291. URL https://aclanthology.org/ 2023.acl-long.291.
[13] Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, 2023.
[14] Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, and Yi Zhang. TinyGSM: achieving > 80% on GSM8k with small language models. arXiv preprint arXiv:2312.09241, 2023.
[15] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self- feedback. Advances in Neural Information Processing Systems, 36, 2024.
[16] Marah Abdin et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.
[17] John Miller, Karl Krauth, Benjamin Recht, and Ludwig Schmidt. The effect of natural distribution shift on question answering models. In International conference on machine learning, pages 6905–6916. PMLR, 2020.
[18] Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
[19] Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, and William Yang Wang. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. arXiv preprint arXiv:2308.03188, 2023.
[20] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[21] Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision transformers see like convolutional neural networks? Advances in neural information processing systems, 34:12116–12128, 2021.
[22] Yiheng Shu and Zhiwei Yu. Distribution shifts are bottlenecks: Extensive evaluation for grounding language models to knowledge bases. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 71–88, 2024.
[23] Marta Skreta, Naruki Yoshikawa, Sebastian Arellano-Rubach, Zhi Ji, Lasse Bjørn Kristensen, Kourosh Darvish, Al ́an Aspuru-Guzik, Florian Shkurti, and Animesh Garg. Errors are useful prompts: Instruction guided task programming with verifier-assisted iterative prompting. arXiv preprint arXiv:2303.14100, 2023.
[24] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021.
[25] Llama Team. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
[26] Fei Wang, Chao Shang, Sarthak Jain, Shuai Wang, Qiang Ning, Bonan Min, Vittorio Castelli, Yassine Benajiba, and Dan Roth. From instructions to constraints: Language model alignment with automatic constraint verification. arXiv preprint arXiv:2403.06326, 2024.
[27] Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
[28] Kaiyu Yang, Jia Deng, and Danqi Chen. Generating natural language proofs with verifier-guided search. arXiv preprint arXiv:2205.12443, 2022.
[29] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade- School Math and the Hidden Reasoning Process. arXiv e-prints, abs/2407.20311, 2024. Full version available at http://arxiv.org/abs/2407.20311.
[30] Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, and Lu Wang. Small language models need strong verifiers to self-correct reasoning. arXiv preprint arXiv:2404.17140, 2024.
[31] Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? a study in length generalization. arXiv preprint arXiv:2310.16028, 2023.

Reference on the Reference

1 Kwangjun Ahn

Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, and Suvrit Sra. Lin- ear attention is (maybe) all you need (to understand transformer optimization). arXiv preprint arXiv:2310.01082, 2023.
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? investigations with linear models. International Conference on Learning Representations, 2022.
Zeyuan Allen-Zhu and Yuanzhi Li. Physics of language models: Part 1, context-free grammar. arXiv preprint arXiv:2305.13673, 2023.
Noga Alon and Joel H Spencer. The probabilistic method. John Wiley & Sons, 2016.
Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, et al. Gpt-neox-20b: An open-source autoregressive language model. Proceedings of BigScience – Workshop on Challenges & Perspectives in Creating Large Language Models, 2022.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Neural Information Processing Systems, 2020.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2019.
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
Benjamin L Edelman, Surbhi Goel, Sham Kakade, and Cyril Zhang. Inductive biases and variable creation in self-attention mechanisms. In International Conference on Machine Learning (ICML), 2022.
Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
Murat A Erdogdu, Lee H Dicker, and Mohsen Bayati. Scaled least squares estimator for glms in large-scale problems. Advances in Neural Information Processing Systems, 29, 2016.
Shivam Garg, Dimitris Tsipras, Percy S Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 35:30583–30598, 2022.
Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D Lee, and Dimitris Pa- pailiopoulos. Looped transformers as programmable computers. arXiv preprint arXiv:2301.13196, 2023.
Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.
Stanisław Jastrzebski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, and Yoshua Bengio. Residual connections encourage iterative inference. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SJa9iHgAZ.
Ke Li and Jitendra Malik. Learning to optimize. In International Conference on Learning Represen- tations, 2017.
11
Opher Lieber, Or Sharir, Barak Lenz, and Yoav Shoham. Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs, 2021.
Arvind Mahankali, Tatsunori B Hashimoto, and Tengyu Ma. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. arXiv preprint arXiv:2307.03576, 2023.
Sewon Min, Mike Lewis, Luke Zettlemoyer, and Hannaneh Hajishirzi. Metaicl: Learning to learn in context. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. In-context learning and induction heads. Transformer Circuits Thread, 2022.
Jorge Pérez, Pablo Barceló, and Javier Marinkovic. Attention is turing complete. The Journal of Machine Learning Research, 2021.
Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446, 2021.
Imanol Schlag, Kazuki Irie, and Jürgen Schmidhuber. Linear transformers are secretly fast weight programmers. In International Conference on Machine Learning, pages 9355–9366. PMLR, 2021.
Hava T Siegelmann and Eduardo D Sontag. On the computational power of neural nets. In Proceedings of Workshop on Computational learning theory, 1992.
Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 2017.
Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. In International Conference on Machine Learning, pages 35151–35174. PMLR, 2023.
Colin Wei, Yining Chen, and Tengyu Ma. Statistically meaningful approximation: a case study on approximating turing machines with transformers. Advances in Neural Information Processing Systems, 35:12071–12083, 2022.
Mitchell Wortsman, Jaehoon Lee, Justin Gilmer, and Simon Kornblith. Replacing softmax with relu in vision transformers. arXiv preprint arXiv:2309.08586, 2023.
Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference. International Conference on Learning Representations, 2021.
Ruiqi Zhang, Spencer Frei, and Peter L Bartlett. Trained transformers learn linear models in-context. arXiv preprint arXiv:2306.09927, 2023.
Haoyu Zhao, Abhishek Panigrahi, Rong Ge, and Sanjeev Arora. Do transformers parse while predicting the masked word? arXiv preprint arXiv:2303.08117, 2023.

2

[1] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. ArXiv e-prints, abs/2309.14316, September 2023. Full version available at http://arxiv. org/abs/2309.14316.
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.2, Knowledge Manipulation. ArXiv e-prints, abs/2309.14402, September 2023. Full version available at http://arxiv.org/abs/ 2309.14402.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Backward feature correction: How deep learning performs deep learning. In COLT, 2023. Full version available at http://arxiv.org/abs/2001.04413.
[4] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[5] Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via over- parameterization. In ICML, 2019. Full version available at http://arxiv.org/abs/1811.03962.
[6] Sanjeev Arora and Yi Zhang. Do gans actually learn the distribution? an empirical study. arXiv preprint arXiv:1706.08224, 2017.
[7] David Arps, Younes Samih, Laura Kallmeyer, and Hassan Sajjad. Probing for constituency structure in neural language models. arXiv preprint arXiv:2204.06201, 2022.
[8] James K Baker. Trainable grammars for speech recognition. The Journal of the Acoustical Society of America, 65(S1):S132–S132, 1979.
[9] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[10] Gregoire Deletang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, et al. Neural networks and the chomsky hierar- chy. In ICLR, 2023.
[11] Brian DuSell and David Chiang. Learning hierarchical structures with differentiable nondeterministic stacks. In ICLR, 2022.
[12] Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1, 2021.
[13] Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020.
[14] John Hewitt and Christopher D. Manning. A structural probe for finding syntax in word represen- tations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1419. URL https://aclanthology.org/N19-1419.
[15] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
[16] Christopher D Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48):30046–30054, 2020.
[17] Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. URL https: //aclanthology.org/J93-2004.
[18] Rowan Hall Maudslay and Ryan Cotterell. Do syntactic probes probe syntax? experiments with jabberwocky probing. arXiv preprint arXiv:2106.02559, 2021.
[19] Milad Moradi and Matthias Samwald. Evaluating the robustness of neural language models to input perturbations. arXiv preprint arXiv:2108.12237, 2021.
[20] Shikhar Murty, Pratyusha Sharma, Jacob Andreas, and Christopher D Manning. Characterizing intrin- sic compositionality in transformers with tree projections. In ICLR, 2023.
[21] Neel Nanda, Lawrence Chan, Tom Liberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217, 2023.
[22] Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
[23] OpenAI. Gpt-4 technical report, 2023.
[24] Matt Post and Shane Bergsma. Explicit and implicit syntactic features for text classification. In
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 866–872, 2013.
[25] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
[26] Itiroo Sakai. Syntax in universal translation. In Proceedings of the International Conference on Machine Translation and Applied Language Analysis, 1961.
[27] Hui Shi, Sicun Gao, Yuandong Tian, Xinyun Chen, and Jishen Zhao. Learning bounded context-free- grammar via lstm and the transformer: Difference and the explanations. In Proceedings of the AAAI
[28] Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 2012.
[29] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021.
[30] Lifu Tu, Garima Lalwani, Spandana Gella, and He He. An empirical study on robustness to spurious correlations using pre-trained language models. Transactions of the Association for Computational Linguistics, 8:621–633, 2020.
[31] David Vilares, Michalina Strzyz, Anders Søgaard, and Carlos G ́omez-Rodr ́ıguez. Parsing as pretraining. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9114–9121, 2020.
[32] KevinWang, AlexandreVariengien, ArthurConmy, BuckShlegeris,and Jacob Steinhardt. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022.
[33] Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. Perturbed masking: Parameter-free probing for analyzing and interpreting bert. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4166–4176, 2020.
[34] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process. arXiv preprint, 2024. to appear.
[35] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems. arXiv preprint, 2024. to appear.
[36] Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky, and Talia Ringer. Can trans- formers learn to solve problems recursively? arXiv preprint arXiv:2305.14699, 2023.
[37] Haoyu Zhao, Abhishek Panigrahi, Rong Ge, and Sanjeev Arora. Do transformers parse while predicting the masked word? arXiv preprint arXiv:2303.08117, 2023.

3

[1] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 1, Learning Hierarchical Language Structures. ArXiv e-prints, abs/2305.13673, May 2023. Full version available at http://arxiv.org/ abs/2305.13673.
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. In ICML, 2024. Full version available at http://arxiv.org/abs/2309.14316.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[4] Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans. The Reversal Curse: LLMs trained on ”A is B” fail to learn ”B is A”. arXiv preprint arXiv:2309.12288, September 2023.
[5] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[6] Deng Cai, Yan Wang, Lemao Liu, and Shuming Shi. Recent advances in retrieval-augmented text gen- eration. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3417–3419, 2022.
[7] Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021.
[8] Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere, David Lopez-Paz, and Gabriel Synnaeve. Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737, 2024. [9] Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, and Sainbayar Sukhbaatar. Reverse training to nurse the reversal curse. arXiv preprint arXiv:2403.13799, 2024. [10] Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, and Yujiu Yang. Mitigating reversal curse via semantic-aware permutation training. arXiv preprint arXiv:2403.00758, 2024. [11] Evan Hernandez, Belinda Z Li, and Jacob Andreas. Measuring and manipulating knowledge represen- tations in language models. arXiv preprint arXiv:2304.00740, 2023. [12] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR, 2021. [13] AlbertQJiang,AlexandreSablayrolles, ArthurMensch, ChrisBamford, DevendraSinghChaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. [14] Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023. [15] Mojtaba Komeili, Kurt Shuster, and Jason Weston. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566, 2021. [16] R ́emi Lebret, David Grangier, and Michael Auli. Generating text from structured data with application to the biography domain. CoRR, abs/1603.07771, 2016. URL http://arxiv.org/abs/1603.07771. [17] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Ku ̈ttler, Mike Lewis, Wen-tau Yih, Tim Rockt ̈aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/ paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf. [18] Shangqing Liu, Yu Chen, Xiaofei Xie, Jingkai Siow, and Yang Liu. Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405, 2020. [19] Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553, 2020. [20] Tahira Naseem, Srinivas Ravishankar, Nandana Mihindukulasooriya, Ibrahim Abdelaziz, Young-Suk Lee, Pavan Kapanipathi, Salim Roukos, Alfio Gliozzo, and Alexander Gray. A semantics-aware trans- former model of relation linking for knowledge base question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 256–262, Online, August 2021. Association for Computational Linguistics. [21] Anh Nguyen, Nikos Karampatziakis, and Weizhu Chen. Meet in the middle: A new pre-training paradigm. Advances in Neural Information Processing Systems, 36, 2024. [22] Reham Omar, Omij Mangukiya, Panos Kalnis, and Essam Mansour. Chatgpt versus traditional ques- tion answering for knowledge graphs: Current status and future directions towards knowledge graph chatbots. arXiv preprint arXiv:2302.06466, 2023. [23] OpenAI. Gpt-4 technical report, 2023. [24] Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601, 2021. [25] Hao Peng, Xiaozhi Wang, Shengding Hu, Hailong Jin, Lei Hou, Juanzi Li, Zhiyuan Liu, and Qun Liu. Copen: Probing conceptual knowledge in pre-trained language models. arXiv preprint arXiv:2211.04079, 2022. [26] Fabio Petroni, Tim Rockt ̈aschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019. [27] Jacob Pfau, William Merrill, and Samuel R Bowman. Let’s think dot by dot: Hidden computation in transformer language models. arXiv preprint arXiv:2404.15758, 2024. [28] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [29] OriRam,YoavLevine,ItayDalmedigos,DorMuhlgay,AmnonShashua,KevinLeyton-Brown,andYoav Shoham. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083, 2023. [30] Kyle Richardson and Ashish Sabharwal. What does my QA model know? devising controlled probes using expert knowledge. Transactions of the Association for Computational Linguistics, 8:572–588, 2020. doi: 10.1162/tacl a 00331. URL https://aclanthology.org/2020.tacl-1.37. [31] Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022. [32] Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, and Suranga Nanayakkara. Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11:1–17, 2023. [33] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021. [34] Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. Head-to-tail: How knowl- edgeable are large language models (llm)? aka will llms replace knowledge graphs? arXiv preprint arXiv:2308.10168, 2023. [35] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth ́ee Lacroix, Baptiste Roziere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[36] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
[37] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process. arXiv preprint arXiv:xxxx.xxxxx, 2024. to appear.
[38] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems. arXiv preprint arXiv:xxxx.xxxxx, 2024. to appear.
[39] Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.

単語帳

単語帳の作成目的は、英語論文を読む際に、頻出単語上位で知らない単語がないことを確認するためでした。
国際規格の翻訳業務をしていたときには、予定訳語を想定し、同じ文脈では同じ訳語を使うための基準として作成していました。
機械翻訳ソフトの開発に参加してからは、機械翻訳ソフトの精度確認のため、予定訳語を想定し、文脈上どのように訳しているかを確認するために利用していました。

LLM論文の翻訳はあまりしておらず、今回は、まだ訳語はつけていません。何本か単語帳を作ったら、100後から1000語くらいまでは訳語をつけようかなって思っています。

PDFファイルで保存してそのままTXTファイルにしています。論文内部ではない単語も混ざっています。ごめんなさい。経験上1%を超えないことと、定型的なものがあり論文共通の文字列だったりすることがあります。誤差ということでご勘弁。

自然言語処理をdocker(186)で
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487

次回からは

bash
docker run -v /tmp/llm:/rmp/llm -it kaizenjapan/llm /bin/bash
term count
1 the 540
2 retry 329
3 to 252
4 of 249
5 a 237
6 s 221
7 with 210
8 op 202
9 and 199
10 in 182
11 we 165
12 data 160
13 retryrate 160
14 is 141
15 for 140
16 model 131
17 each 123
18 as 119
19 on 117
20 this 116
21 can 103
22 it 99
23 error 90
24 e 88
25 lora 82
26 that 81
27 finetune 80
28 figure 76
29 box 70
30 define 70
31 unnecessary 69
32 models 67
33 language 63
34 pretrain 60
35 so 60
36 number 59
37 qv 57
38 igsm 52
39 mask 52
40 not 52
41 be 49
42 correct 49
43 from 49
44 pretrained 49
45 rate 48
46 accuracy 47
47 b 47
48 mistakes 47
49 such 47
50 using 45
51 cheese 44
52 market 44
53 or 42
54 param 41
55 solution 41
56 beam 40
57 math 40
58 studio 39
59 arxiv 38
60 maskretryrate 38
61 training 38
62 its 37
63 are 36
64 free 35
65 pretraining 35
66 parameters 34
67 reasoning 34
68 equals 33
69 has 33
70 if 33
71 steps 33
72 also 32
73 errors 32
74 problems 32
75 retry_weak 32
76 solutions 32
77 even 31
78 an 30
79 film 30
80 high 30
81 upon 29
82 use 29
83 regret 28
84 school 28
85 g 27
86 gpt 27
87 have 27
88 x 27
89 r 26
90 tokens 26
91 when 26
92 by 25
93 jim 25
94 probing 25
95 section 25
96 w 25
97 finetuning 24
98 next 24
99 only 24
100 process 24
101 see 24
102 correction 23
103 generation 23
104 learning 23
105 original 23
106 result 23
107 same 23
108 version 23
109 after 22
110 c 22
111 dist 22
112 i 22
113 international 22
114 parmesan 22
115 than 22
116 weight 22
117 at 21
118 but 21
119 cream 21
120 full 21
121 jungle 21
122 mistake 21
123 our 21
124 reask 21
125 backpack 20
126 district 20
127 ice 20
128 inside 20
129 ood 20
130 p 20
131 parameter 19
132 problem 19
133 rank 19
134 results 19
135 sentence 19
136 step 19
137 their 19
138 time 19
139 used 19
140 back 18
141 distout 18
142 many 18
143 more 18
144 one 18
145 own 18
146 central 17
147 example 17
148 paper 17
149 still 17
150 v 17
151 which 17
152 all 16
153 during 16
154 grape 16
155 make 16
156 no 16
157 pounds 16
158 u 16
159 al 15
160 d 15
161 detection 15
162 how 15
163 improve 15
164 k 15
165 masking 15
166 retry_miss 15
167 they 15
168 context 14
169 continued 14
170 datasets 14
171 decay 14
172 directly 14
173 does 14
174 hard 14
175 l 14
176 length 14
177 part 14
178 preprint 14
179 q 14
180 small 14
181 weak 14
182 while 14
183 appendix 13
184 corrections 13
185 fine 13
186 fresh 13
187 generate 13
188 learn 13
189 li 13
190 search 13
191 size 13
192 synthetic 13
193 trained 13
194 wrong 13
195 y 13
196 best 12
197 daypack 12
198 double 12
199 label 12
200 made 12
201 med 12
202 pineapple 12
203 self 12
204 there 12
205 very 12
206 accuracies 11
207 batch 11
208 better 11
209 cannot 11
210 details 11
211 et 11
212 experiments 11
213 higher 11
214 however 11
215 o 11
216 over 11
217 perfect 11
218 test 11
219 FALSE 11
220 abs 10
221 chen 10
222 college 10
223 compared 10
224 different 10
225 distribution 10
226 embedding 10
227 experiment 10
228 goat 10
229 inference 10
230 joe 10
231 may 10
232 miss 10
233 need 10
234 residential 10
235 retries 10
236 skill 10
237 t 10
238 uses 10
239 vs 10
240 weights 10
241 where 10
242 without 10
243 yuanzhi 10
244 among 9
245 approach 9
246 call 9
247 dance 9
248 do 9
249 generated 9
250 j 9
251 large 9
252 layer 9
253 llms 9
254 med_pqigsm 9
255 stage 9
256 sum 9
257 versionp 9
258 what 9
259 would 9
260 zhu 9
261 TRUE 9
262 additional 8
263 allen 8
264 average 8
265 because 8
266 comparing 8
267 comparison 8
268 dataset 8
269 erroneous 8
270 evaluation 8
271 f 8
272 fake 8
273 followed 8
274 following 8
275 future 8
276 goal 8
277 including 8
278 instance 8
279 internal 8
280 knowledge 8
281 line 8
282 med_qpigsm 8
283 messenger 8
284 much 8
285 product 8
286 prompting 8
287 simply 8
288 times 8
289 total 8
290 tuning 8
291 two 8
292 verify 8
293 wang 8
294 already 7
295 answer 7
296 arithmetic 7
297 compare 7
298 detect 7
299 detector 7
300 easily 7
301 first 7
302 generating 7
303 level 7
304 like 7
305 meta 7
306 needed 7
307 new 7
308 org 7
309 other 7
310 output 7
311 perform 7
312 physics 7
313 question 7
314 similar 7
315 thus 7
316 trader 7
317 trainable 7
318 via 7
319 zeyuan 7
320 adding 6
321 additionally 6
322 available 6
323 before 6
324 compute 6
325 dosample 6
326 especially 6
327 except 6
328 few 6
329 harder 6
330 here 6
331 immediately 6
332 important 6
333 knows 6
334 linear 6
335 low 6
336 making 6
337 might 6
338 most 6
339 n 6
340 note 6
341 operations 6
342 out 6
343 pear 6
344 perfectly 6
345 presented 6
346 qp 6
347 random 6
348 re 6
349 riverview 6
350 samples 6
351 seeds 6
352 significantly 6
353 states 6
354 sufficiently 6
355 tasks 6
356 teach 6
357 those 6
358 twice 6
359 update 6
360 value 6
361 why 6
362 work 6
363 ye 6
364 accurate 5
365 achieve 5
366 achieved 5
367 amount 5
368 arts 5
369 autoregressive 5
370 based 5
371 been 5
372 both 5
373 campus 5
374 cell 5
375 ch 5
376 choices 5
377 complexity 5
378 computations 5
379 computed 5
380 contains 5
381 controlled 5
382 correctness 5
383 counts 5
384 difficulties 5
385 discover 5
386 end 5
387 explore 5
388 fair 5
389 find 5
390 focus 5
391 grade 5
392 h 5
393 http 5
394 immediate 5
395 include 5
396 information 5
397 ingredient 5
398 instead 5
399 into 5
400 long 5
401 matrices 5
402 multi 5
403 necessary 5
404 needs 5
405 neural 5
406 observation 5
407 per 5
408 please 5
409 prepare 5
410 present 5
411 probability 5
412 provide 5
413 query 5
414 randomly 5
415 regretful 5
416 seasons 5
417 setting 5
418 shortest 5
419 should 5
420 show 5
421 shows 5
422 solve 5
423 some 5
424 statistics 5
425 these 5
426 through 5
427 token 5
428 type 5
429 up 5
430 us 5
431 verifier 5
432 were 5
433 works 5
434 yet 5
435 zhang 5
436 actually 4
437 advances 4
438 almost 4
439 another 4
440 any 4
441 association 4
442 attention 4
443 carefully 4
444 cases 4
445 changes 4
446 check 4
447 common 4
448 computation 4
449 computational 4
450 conclusion 4
451 consider 4
452 contrast 4
453 controllable 4
454 could 4
455 difference 4
456 due 4
457 easy 4
458 effective 4
459 exhibit 4
460 fewer 4
461 final 4
462 found 4
463 fully 4
464 given 4
465 gpu 4
466 graph 4
467 hidden 4
468 improvement 4
469 included 4
470 introduce 4
471 just 4
472 larger 4
473 lee 4
474 linguistics 4
475 liu 4
476 llama 4
477 longer 4
478 makes 4
479 matrix 4
480 medop 4
481 method 4
482 multiple 4
483 natural 4
484 observations 4
485 obtain 4
486 oh 4
487 originalunnecessary 4
488 pages 4
489 possible 4
490 pq 4
491 prints 4
492 proceedings 4
493 processing 4
494 range 4
495 ready 4
496 real 4
497 regenerate 4
498 requires 4
499 round 4
500 select 4
501 sense 4
502 shown 4
503 since 4
504 skip 4
505 statement 4
506 strong 4
507 systems 4
508 tells 4
509 them 4
510 then 4
511 though 4
512 towards 4
513 transformers 4
514 trivial 4
515 understanding 4
516 verification 4
517 vocational 4
518 weighs 4
519 whether 4
520 within 4
521 your 4
522 z 4
523 about 3
524 accurately 3
525 acl 3
526 adamw 3
527 add 3
528 again 3
529 alignment 3
530 allow 3
531 although 3
532 auto 3
533 automatically 3
534 banana 3
535 below 3
536 between 3
537 case 3
538 chance 3
539 change 3
540 choose 3
541 com 3
542 commercial 3
543 complements 3
544 correcting 3
545 cosine 3
546 count 3
547 creating 3
548 determine 3
549 did 3
550 differs 3
551 discard 3
552 down 3
553 eese 3
554 effectively 3
555 efficient 3
556 encourages 3
557 entire 3
558 evaluated 3
559 extreme 3
560 failure 3
561 finetuned 3
562 format 3
563 gain 3
564 general 3
565 go 3
566 half 3
567 hardest 3
568 hardpq 3
569 head 3
570 highly 3
571 https 3
572 ideally 3
573 identical 3
574 illustration 3
575 implies 3
576 improves 3
577 improving 3
578 indicating 3
579 intelligence 3
580 interestingly 3
581 intuitively 3
582 keep 3
583 least 3
584 let 3
585 life 3
586 lin 3
587 lu 3
588 maximum 3
589 methods 3
590 mlp 3
591 nearly 3
592 non 3
593 notably 3
594 observe 3
595 once 3
596 overall 3
597 pattern 3
598 peft 3
599 performance 3
600 performs 3
601 position 3
602 practice 3
603 prediction 3
604 previous 3
605 produce 3
606 prompts 3
607 qiang 3
608 randomness 3
609 rather 3
610 realize 3
611 reasoners 3
612 right 3
613 rows 3
614 sample 3
615 sampling 3
616 save 3
617 scheduling 3
618 sentences 3
619 set 3
620 shift 3
621 significant 3
622 simplify 3
623 sinternation 3
624 solving 3
625 specifically 3
626 st 3
627 stating 3
628 strikethrough 3
629 success 3
630 sufficient 3
631 supermarket 3
632 table 3
633 task 3
634 temperature 3
635 too 3
636 top 3
637 tried 3
638 truly 3
639 try 3
640 tunes 3
641 understand 3
642 unless 3
643 updates 3
644 versions 3
645 want 3
646 was 3
647 weizhu 3
648 will 3
649 writing 3
650 xu 3
651 yang 3
652 yu 3
653 zhou 3
654 above 2
655 abstract 2
656 achieving 2
657 acquire 2
658 acquired 2
659 actual 2
660 adaptation 2
661 adapted 2
662 against 2
663 algorithm 2
664 allowing 2
665 alters 2
666 anna 2
667 annual 2
668 appeared 2
669 appears 2
670 applied 2
671 apply 2
672 approaches 2
673 architecture 2
674 article 2
675 aspect 2
676 aspects 2
677 augment 2
678 auxiliary 2
679 avoid 2
680 aware 2
681 backpacks 2
682 becomes 2
683 begin 2
684 bei 2
685 ben 2
686 benefit 2
687 betas 2
688 bin 2
689 broken 2
690 bubeck 2
691 calculate 2
692 capability 2
693 capacity 2
694 checking 2
695 cmu 2
696 comes 2
697 comparable 2
698 complex 2
699 conclusions 2
700 conference 2
701 configurations 2
702 conjecture 2
703 consists 2
704 constructed 2
705 contained 2
706 corpus 2
707 correctly 2
708 corrects 2
709 correspond 2
710 cot 2
711 crucial 2
712 current 2
713 daypacks 2
714 decoder 2
715 decoding 2
716 decreased 2
717 deferred 2
718 definethefresh 2
719 demonstrate 2
720 denote 2
721 dependencyunused 2
722 depends 2
723 description 2
724 desirable 2
725 detected 2
726 detecting 2
727 difficulty 2
728 doesn 2
729 easier 2
730 efforts 2
731 either 2
732 eldan 2
733 eliminating 2
734 encourage 2
735 english 2
736 ensuring 2
737 eric 2
738 essentially 2
739 evidence 2
740 exact 2
741 examples 2
742 exploring 2
743 extremely 2
744 fei 2
745 finally 2
746 focuses 2
747 follow 2
748 framework 2
749 fu 2
750 gao 2
751 generalization 2
752 generates 2
753 generative 2
754 give 2
755 grammar 2
756 ground 2
757 gsm 2
758 guang 2
759 guide 2
760 guided 2
761 he 2
762 help 2
763 hierarchical 2
764 hig 2
765 hu 2
766 icecreamasp 2
767 idea 2
768 implement 2
769 implemented 2
770 incentivized 2
771 increase 2
772 increases 2
773 indicates 2
774 initial 2
775 insert 2
776 integers 2
777 interested 2
778 introduced 2
779 introduction 2
780 issue 2
781 iterative 2
782 jian 2
783 job 2
784 jun 2
785 karl 2
786 last 2
787 later 2
788 layers 2
789 leading 2
790 leads 2
791 learned 2
792 learns 2
793 less 2
794 lets 2
795 limit 2
796 llm 2
797 location 2
798 logic 2
799 lou 2
800 lower 2
801 lr 2
802 mann 2
803 manner 2
804 maskigsm 2
805 masks 2
806 mbzuai 2
807 meaning 2
808 means 2
809 medpq 2
810 meeting 2
811 michael 2
812 miller 2
813 multinomial 2
814 name 2
815 now 2
816 occurs 2
817 often 2
818 open 2
819 otherwise 2
820 outperform 2
821 pan 2
822 papers 2
823 paramterabstract 2
824 particularly 2
825 performed 2
826 perhaps 2
827 positional 2
828 potentially 2
829 practical 2
830 predict 2
831 predicting 2
832 prepared 2
833 promising 2
834 purposes 2
835 ramp 2
836 rarely 2
837 rates 2
838 realistic 2
839 reason 2
840 reasonable 2
841 recall 2
842 recently 2
843 regressive 2
844 reliably 2
845 remains 2
846 remove 2
847 require 2
848 research 2
849 resp 2
850 rewrite 2
851 rict 2
852 ronen 2
853 rotary 2
854 rounds 2
855 rumored 2
856 safe 2
857 say 2
858 sbananaask 2
859 sees 2
860 selecting 2
861 shifts 2
862 shizhuo 2
863 short 2
864 shot 2
865 showcase 2
866 simple 2
867 simulate 2
868 single 2
869 skills 2
870 slightly 2
871 smaller 2
872 solely 2
873 sometimes 2
874 sop 2
875 source 2
876 soy 2
877 special 2
878 stands 2
879 starting 2
880 strongly 2
881 structures 2
882 studies 2
883 study 2
884 summarize 2
885 support 2
886 sure 2
887 teaches 2
888 team 2
889 technique 2
890 tempting 2
891 tend 2
892 testing 2
893 text 2
894 therefore 2
895 three 2
896 throughout 2
897 tian 2
898 train 2
899 transformer 2
900 truth 2
901 tuned 2
902 typically 2
903 underlined 2
904 unlike 2
905 unlikely 2
906 until 2
907 url 2
908 usefulness 2
909 user 2
910 usually 2
911 various 2
912 verifiers 2
913 volume 2
914 well 2
915 wide 2
916 widely 2
917 window 2
918 words 2
919 workshop 2
920 world 2
921 wu 2
922 yifei 2
923 you 2
924 zeqi 2
925 zicheng 2

ひさびさにdocker(186)使う
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487
のawkで処理。

PDFからTXTにする際に、複合語が分離できていない課題がある。理由は不明。

1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?