生成AIAdvent Calendar 2024

LLMはハルシネーションを自覚しているか　松尾研 LLM コミュニティ "Paper & Hacks Vol.24" AI(6)

Last updated at 2024-11-27Posted at 2024-11-10

生成AI Calendar 2024 @k-keita　
https://qiita.com/advent-calendar/2024/generateai
17日投稿予定の記事です。

松尾研 LLM コミュニティ "Paper & Hacks Vol.24"
https://matsuolab-community.connpass.com/event/335384/

発表者: 松尾研LLMコミュニティメンバー下村晃生（九州工業大学工学部宇宙システム工学科/松尾研大規模言語モデル講座第９回担当）
テーマ: LLMはハルシネーションを自覚しているか
論文リンク：Physics of Language Models: Part2.2 How to Learn From Mistakes on Grade-School Math Problems
https://arxiv.org/abs/2408.16293

自然言語処理をdocker(186)で
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487

参考文献の参考文献を一覧とするとともに単語帳を作る。（途中）
単語帳は、２回出現する単語までをここに記載している。１度だけの単語はDocker上にある。

<この項は書きかけです。順次追記します。>
This article is not completed. I will add some words and/or centences in order.

Reference

[1] Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra. Transformers learn to implement preconditioned gradient descent for in-context learning. Advances in Neural Information Processing Systems, 36, 2024.
https://arxiv.org/abs/2306.00297
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 1, Learning Hierarchical Language Structures. ArXiv e-prints, abs/2305.13673, May 2023. Full version available at http://arxiv.org/abs/2305.13673.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.2, Knowledge Manipulation. ArXiv e-prints, abs/2309.14402, September 2023. Full version available at http://arxiv.org/abs/2309.14402.
[4] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. In ICML, 2024. Full version available at http://arxiv.org/abs/2309.14316.
[5] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[6] Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, and Behnam Neyshabur. Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022.
[7] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[8] S ́ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
[9] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
[10] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. In ICLR, 2021.
[11] Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou. Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798, 2023.
[12] Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. Making language models better reasoners with step-aware verifier. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.291. URL https://aclanthology.org/ 2023.acl-long.291.
[13] Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, 2023.
[14] Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, and Yi Zhang. TinyGSM: achieving > 80% on GSM8k with small language models. arXiv preprint arXiv:2312.09241, 2023.
[15] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self- feedback. Advances in Neural Information Processing Systems, 36, 2024.
[16] Marah Abdin et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.
[17] John Miller, Karl Krauth, Benjamin Recht, and Ludwig Schmidt. The effect of natural distribution shift on question answering models. In International conference on machine learning, pages 6905–6916. PMLR, 2020.
[18] Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
[19] Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, and William Yang Wang. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. arXiv preprint arXiv:2308.03188, 2023.
[20] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[21] Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision transformers see like convolutional neural networks? Advances in neural information processing systems, 34:12116–12128, 2021.
[22] Yiheng Shu and Zhiwei Yu. Distribution shifts are bottlenecks: Extensive evaluation for grounding language models to knowledge bases. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 71–88, 2024.
[23] Marta Skreta, Naruki Yoshikawa, Sebastian Arellano-Rubach, Zhi Ji, Lasse Bjørn Kristensen, Kourosh Darvish, Al ́an Aspuru-Guzik, Florian Shkurti, and Animesh Garg. Errors are useful prompts: Instruction guided task programming with verifier-assisted iterative prompting. arXiv preprint arXiv:2303.14100, 2023.
[24] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021.
[25] Llama Team. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
[26] Fei Wang, Chao Shang, Sarthak Jain, Shuai Wang, Qiang Ning, Bonan Min, Vittorio Castelli, Yassine Benajiba, and Dan Roth. From instructions to constraints: Language model alignment with automatic constraint verification. arXiv preprint arXiv:2403.06326, 2024.
[27] Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
[28] Kaiyu Yang, Jia Deng, and Danqi Chen. Generating natural language proofs with verifier-guided search. arXiv preprint arXiv:2205.12443, 2022.
[29] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade- School Math and the Hidden Reasoning Process. arXiv e-prints, abs/2407.20311, 2024. Full version available at http://arxiv.org/abs/2407.20311.
[30] Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, and Lu Wang. Small language models need strong verifiers to self-correct reasoning. arXiv preprint arXiv:2404.17140, 2024.
[31] Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? a study in length generalization. arXiv preprint arXiv:2310.16028, 2023.

Reference on the Reference

1 Kwangjun Ahn

Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, and Suvrit Sra. Lin- ear attention is (maybe) all you need (to understand transformer optimization). arXiv preprint arXiv:2310.01082, 2023.
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? investigations with linear models. International Conference on Learning Representations, 2022.
Zeyuan Allen-Zhu and Yuanzhi Li. Physics of language models: Part 1, context-free grammar. arXiv preprint arXiv:2305.13673, 2023.
Noga Alon and Joel H Spencer. The probabilistic method. John Wiley & Sons, 2016.
Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, et al. Gpt-neox-20b: An open-source autoregressive language model. Proceedings of BigScience – Workshop on Challenges & Perspectives in Creating Large Language Models, 2022.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Neural Information Processing Systems, 2020.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2019.
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
Benjamin L Edelman, Surbhi Goel, Sham Kakade, and Cyril Zhang. Inductive biases and variable creation in self-attention mechanisms. In International Conference on Machine Learning (ICML), 2022.
Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
Murat A Erdogdu, Lee H Dicker, and Mohsen Bayati. Scaled least squares estimator for glms in large-scale problems. Advances in Neural Information Processing Systems, 29, 2016.
Shivam Garg, Dimitris Tsipras, Percy S Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 35:30583–30598, 2022.
Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D Lee, and Dimitris Pa- pailiopoulos. Looped transformers as programmable computers. arXiv preprint arXiv:2301.13196, 2023.
Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.
Stanisław Jastrzebski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, and Yoshua Bengio. Residual connections encourage iterative inference. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SJa9iHgAZ.
Ke Li and Jitendra Malik. Learning to optimize. In International Conference on Learning Represen- tations, 2017.
11
Opher Lieber, Or Sharir, Barak Lenz, and Yoav Shoham. Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs, 2021.
Arvind Mahankali, Tatsunori B Hashimoto, and Tengyu Ma. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. arXiv preprint arXiv:2307.03576, 2023.
Sewon Min, Mike Lewis, Luke Zettlemoyer, and Hannaneh Hajishirzi. Metaicl: Learning to learn in context. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. In-context learning and induction heads. Transformer Circuits Thread, 2022.
Jorge Pérez, Pablo Barceló, and Javier Marinkovic. Attention is turing complete. The Journal of Machine Learning Research, 2021.
Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446, 2021.
Imanol Schlag, Kazuki Irie, and Jürgen Schmidhuber. Linear transformers are secretly fast weight programmers. In International Conference on Machine Learning, pages 9355–9366. PMLR, 2021.
Hava T Siegelmann and Eduardo D Sontag. On the computational power of neural nets. In Proceedings of Workshop on Computational learning theory, 1992.
Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 2017.
Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. In International Conference on Machine Learning, pages 35151–35174. PMLR, 2023.
Colin Wei, Yining Chen, and Tengyu Ma. Statistically meaningful approximation: a case study on approximating turing machines with transformers. Advances in Neural Information Processing Systems, 35:12071–12083, 2022.
Mitchell Wortsman, Jaehoon Lee, Justin Gilmer, and Simon Kornblith. Replacing softmax with relu in vision transformers. arXiv preprint arXiv:2309.08586, 2023.
Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference. International Conference on Learning Representations, 2021.
Ruiqi Zhang, Spencer Frei, and Peter L Bartlett. Trained transformers learn linear models in-context. arXiv preprint arXiv:2306.09927, 2023.
Haoyu Zhao, Abhishek Panigrahi, Rong Ge, and Sanjeev Arora. Do transformers parse while predicting the masked word? arXiv preprint arXiv:2303.08117, 2023.

2

[1] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. ArXiv e-prints, abs/2309.14316, September 2023. Full version available at http://arxiv. org/abs/2309.14316.
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.2, Knowledge Manipulation. ArXiv e-prints, abs/2309.14402, September 2023. Full version available at http://arxiv.org/abs/ 2309.14402.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Backward feature correction: How deep learning performs deep learning. In COLT, 2023. Full version available at http://arxiv.org/abs/2001.04413.
[4] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[5] Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via over- parameterization. In ICML, 2019. Full version available at http://arxiv.org/abs/1811.03962.
[6] Sanjeev Arora and Yi Zhang. Do gans actually learn the distribution? an empirical study. arXiv preprint arXiv:1706.08224, 2017.
[7] David Arps, Younes Samih, Laura Kallmeyer, and Hassan Sajjad. Probing for constituency structure in neural language models. arXiv preprint arXiv:2204.06201, 2022.
[8] James K Baker. Trainable grammars for speech recognition. The Journal of the Acoustical Society of America, 65(S1):S132–S132, 1979.
[9] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[10] Gregoire Deletang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, et al. Neural networks and the chomsky hierar- chy. In ICLR, 2023.
[11] Brian DuSell and David Chiang. Learning hierarchical structures with differentiable nondeterministic stacks. In ICLR, 2022.
[12] Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1, 2021.
[13] Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020.
[14] John Hewitt and Christopher D. Manning. A structural probe for finding syntax in word represen- tations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1419. URL https://aclanthology.org/N19-1419.
[15] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
[16] Christopher D Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48):30046–30054, 2020.
[17] Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. URL https: //aclanthology.org/J93-2004.
[18] Rowan Hall Maudslay and Ryan Cotterell. Do syntactic probes probe syntax? experiments with jabberwocky probing. arXiv preprint arXiv:2106.02559, 2021.
[19] Milad Moradi and Matthias Samwald. Evaluating the robustness of neural language models to input perturbations. arXiv preprint arXiv:2108.12237, 2021.
[20] Shikhar Murty, Pratyusha Sharma, Jacob Andreas, and Christopher D Manning. Characterizing intrin- sic compositionality in transformers with tree projections. In ICLR, 2023.
[21] Neel Nanda, Lawrence Chan, Tom Liberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217, 2023.
[22] Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
[23] OpenAI. Gpt-4 technical report, 2023.
[24] Matt Post and Shane Bergsma. Explicit and implicit syntactic features for text classification. In
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 866–872, 2013.
[25] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
[26] Itiroo Sakai. Syntax in universal translation. In Proceedings of the International Conference on Machine Translation and Applied Language Analysis, 1961.
[27] Hui Shi, Sicun Gao, Yuandong Tian, Xinyun Chen, and Jishen Zhao. Learning bounded context-free- grammar via lstm and the transformer: Difference and the explanations. In Proceedings of the AAAI
[28] Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 2012.
[29] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021.
[30] Lifu Tu, Garima Lalwani, Spandana Gella, and He He. An empirical study on robustness to spurious correlations using pre-trained language models. Transactions of the Association for Computational Linguistics, 8:621–633, 2020.
[31] David Vilares, Michalina Strzyz, Anders Søgaard, and Carlos G ́omez-Rodr ́ıguez. Parsing as pretraining. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9114–9121, 2020.
[32] KevinWang, AlexandreVariengien, ArthurConmy, BuckShlegeris,and Jacob Steinhardt. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022.
[33] Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. Perturbed masking: Parameter-free probing for analyzing and interpreting bert. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4166–4176, 2020.
[34] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process. arXiv preprint, 2024. to appear.
[35] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems. arXiv preprint, 2024. to appear.
[36] Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky, and Talia Ringer. Can trans- formers learn to solve problems recursively? arXiv preprint arXiv:2305.14699, 2023.
[37] Haoyu Zhao, Abhishek Panigrahi, Rong Ge, and Sanjeev Arora. Do transformers parse while predicting the masked word? arXiv preprint arXiv:2303.08117, 2023.

3

[1] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 1, Learning Hierarchical Language Structures. ArXiv e-prints, abs/2305.13673, May 2023. Full version available at http://arxiv.org/ abs/2305.13673.
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. In ICML, 2024. Full version available at http://arxiv.org/abs/2309.14316.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[4] Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans. The Reversal Curse: LLMs trained on ”A is B” fail to learn ”B is A”. arXiv preprint arXiv:2309.12288, September 2023.
[5] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[6] Deng Cai, Yan Wang, Lemao Liu, and Shuming Shi. Recent advances in retrieval-augmented text gen- eration. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3417–3419, 2022.
[7] Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021.
[8] Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere, David Lopez-Paz, and Gabriel Synnaeve. Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737, 2024. [9] Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, and Sainbayar Sukhbaatar. Reverse training to nurse the reversal curse. arXiv preprint arXiv:2403.13799, 2024. [10] Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, and Yujiu Yang. Mitigating reversal curse via semantic-aware permutation training. arXiv preprint arXiv:2403.00758, 2024. [11] Evan Hernandez, Belinda Z Li, and Jacob Andreas. Measuring and manipulating knowledge represen- tations in language models. arXiv preprint arXiv:2304.00740, 2023. [12] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR, 2021. [13] AlbertQJiang,AlexandreSablayrolles, ArthurMensch, ChrisBamford, DevendraSinghChaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. [14] Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023. [15] Mojtaba Komeili, Kurt Shuster, and Jason Weston. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566, 2021. [16] R ́emi Lebret, David Grangier, and Michael Auli. Generating text from structured data with application to the biography domain. CoRR, abs/1603.07771, 2016. URL http://arxiv.org/abs/1603.07771. [17] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Ku ̈ttler, Mike Lewis, Wen-tau Yih, Tim Rockt ̈aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/ paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf. [18] Shangqing Liu, Yu Chen, Xiaofei Xie, Jingkai Siow, and Yang Liu. Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405, 2020. [19] Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553, 2020. [20] Tahira Naseem, Srinivas Ravishankar, Nandana Mihindukulasooriya, Ibrahim Abdelaziz, Young-Suk Lee, Pavan Kapanipathi, Salim Roukos, Alfio Gliozzo, and Alexander Gray. A semantics-aware trans- former model of relation linking for knowledge base question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 256–262, Online, August 2021. Association for Computational Linguistics. [21] Anh Nguyen, Nikos Karampatziakis, and Weizhu Chen. Meet in the middle: A new pre-training paradigm. Advances in Neural Information Processing Systems, 36, 2024. [22] Reham Omar, Omij Mangukiya, Panos Kalnis, and Essam Mansour. Chatgpt versus traditional ques- tion answering for knowledge graphs: Current status and future directions towards knowledge graph chatbots. arXiv preprint arXiv:2302.06466, 2023. [23] OpenAI. Gpt-4 technical report, 2023. [24] Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601, 2021. [25] Hao Peng, Xiaozhi Wang, Shengding Hu, Hailong Jin, Lei Hou, Juanzi Li, Zhiyuan Liu, and Qun Liu. Copen: Probing conceptual knowledge in pre-trained language models. arXiv preprint arXiv:2211.04079, 2022. [26] Fabio Petroni, Tim Rockt ̈aschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019. [27] Jacob Pfau, William Merrill, and Samuel R Bowman. Let’s think dot by dot: Hidden computation in transformer language models. arXiv preprint arXiv:2404.15758, 2024. [28] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [29] OriRam,YoavLevine,ItayDalmedigos,DorMuhlgay,AmnonShashua,KevinLeyton-Brown,andYoav Shoham. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083, 2023. [30] Kyle Richardson and Ashish Sabharwal. What does my QA model know? devising controlled probes using expert knowledge. Transactions of the Association for Computational Linguistics, 8:572–588, 2020. doi: 10.1162/tacl a 00331. URL https://aclanthology.org/2020.tacl-1.37. [31] Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022. [32] Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, and Suranga Nanayakkara. Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11:1–17, 2023. [33] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021. [34] Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. Head-to-tail: How knowl- edgeable are large language models (llm)? aka will llms replace knowledge graphs? arXiv preprint arXiv:2308.10168, 2023. [35] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth ́ee Lacroix, Baptiste Roziere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[36] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
[37] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process. arXiv preprint arXiv:xxxx.xxxxx, 2024. to appear.
[38] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems. arXiv preprint arXiv:xxxx.xxxxx, 2024. to appear.
[39] Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.

単語帳

単語帳の作成目的は、英語論文を読む際に、頻出単語上位で知らない単語がないことを確認するためでした。
国際規格の翻訳業務をしていたときには、予定訳語を想定し、同じ文脈では同じ訳語を使うための基準として作成していました。
機械翻訳ソフトの開発に参加してからは、機械翻訳ソフトの精度確認のため、予定訳語を想定し、文脈上どのように訳しているかを確認するために利用していました。

LLM論文の翻訳はあまりしておらず、今回は、まだ訳語はつけていません。何本か単語帳を作ったら、１００後から１０００語くらいまでは訳語をつけようかなって思っています。

PDFファイルで保存してそのままTXTファイルにしています。論文内部ではない単語も混ざっています。ごめんなさい。経験上1%を超えないことと、定型的なものがあり論文共通の文字列だったりすることがあります。誤差ということでご勘弁。

自然言語処理をdocker(186)で
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487

次回からは

bash

docker run -v /tmp/llm:/rmp/llm -it kaizenjapan/llm /bin/bash

	term	count
1	the	540
2	retry	329
3	to	252
4	of	249
5	a	237
6	s	221
7	with	210
8	op	202
9	and	199
10	in	182
11	we	165
12	data	160
13	retryrate	160
14	is	141
15	for	140
16	model	131
17	each	123
18	as	119
19	on	117
20	this	116
21	can	103
22	it	99
23	error	90
24	e	88
25	lora	82
26	that	81
27	finetune	80
28	figure	76
29	box	70
30	define	70
31	unnecessary	69
32	models	67
33	language	63
34	pretrain	60
35	so	60
36	number	59
37	qv	57
38	igsm	52
39	mask	52
40	not	52
41	be	49
42	correct	49
43	from	49
44	pretrained	49
45	rate	48
46	accuracy	47
47	b	47
48	mistakes	47
49	such	47
50	using	45
51	cheese	44
52	market	44
53	or	42
54	param	41
55	solution	41
56	beam	40
57	math	40
58	studio	39
59	arxiv	38
60	maskretryrate	38
61	training	38
62	its	37
63	are	36
64	free	35
65	pretraining	35
66	parameters	34
67	reasoning	34
68	equals	33
69	has	33
70	if	33
71	steps	33
72	also	32
73	errors	32
74	problems	32
75	retry_weak	32
76	solutions	32
77	even	31
78	an	30
79	film	30
80	high	30
81	upon	29
82	use	29
83	regret	28
84	school	28
85	g	27
86	gpt	27
87	have	27
88	x	27
89	r	26
90	tokens	26
91	when	26
92	by	25
93	jim	25
94	probing	25
95	section	25
96	w	25
97	finetuning	24
98	next	24
99	only	24
100	process	24
101	see	24
102	correction	23
103	generation	23
104	learning	23
105	original	23
106	result	23
107	same	23
108	version	23
109	after	22
110	c	22
111	dist	22
112	i	22
113	international	22
114	parmesan	22
115	than	22
116	weight	22
117	at	21
118	but	21
119	cream	21
120	full	21
121	jungle	21
122	mistake	21
123	our	21
124	reask	21
125	backpack	20
126	district	20
127	ice	20
128	inside	20
129	ood	20
130	p	20
131	parameter	19
132	problem	19
133	rank	19
134	results	19
135	sentence	19
136	step	19
137	their	19
138	time	19
139	used	19
140	back	18
141	distout	18
142	many	18
143	more	18
144	one	18
145	own	18
146	central	17
147	example	17
148	paper	17
149	still	17
150	v	17
151	which	17
152	all	16
153	during	16
154	grape	16
155	make	16
156	no	16
157	pounds	16
158	u	16
159	al	15
160	d	15
161	detection	15
162	how	15
163	improve	15
164	k	15
165	masking	15
166	retry_miss	15
167	they	15
168	context	14
169	continued	14
170	datasets	14
171	decay	14
172	directly	14
173	does	14
174	hard	14
175	l	14
176	length	14
177	part	14
178	preprint	14
179	q	14
180	small	14
181	weak	14
182	while	14
183	appendix	13
184	corrections	13
185	fine	13
186	fresh	13
187	generate	13
188	learn	13
189	li	13
190	search	13
191	size	13
192	synthetic	13
193	trained	13
194	wrong	13
195	y	13
196	best	12
197	daypack	12
198	double	12
199	label	12
200	made	12
201	med	12
202	pineapple	12
203	self	12
204	there	12
205	very	12
206	accuracies	11
207	batch	11
208	better	11
209	cannot	11
210	details	11
211	et	11
212	experiments	11
213	higher	11
214	however	11
215	o	11
216	over	11
217	perfect	11
218	test	11
219	FALSE	11
220	abs	10
221	chen	10
222	college	10
223	compared	10
224	different	10
225	distribution	10
226	embedding	10
227	experiment	10
228	goat	10
229	inference	10
230	joe	10
231	may	10
232	miss	10
233	need	10
234	residential	10
235	retries	10
236	skill	10
237	t	10
238	uses	10
239	vs	10
240	weights	10
241	where	10
242	without	10
243	yuanzhi	10
244	among	9
245	approach	9
246	call	9
247	dance	9
248	do	9
249	generated	9
250	j	9
251	large	9
252	layer	9
253	llms	9
254	med_pqigsm	9
255	stage	9
256	sum	9
257	versionp	9
258	what	9
259	would	9
260	zhu	9
261	TRUE	9
262	additional	8
263	allen	8
264	average	8
265	because	8
266	comparing	8
267	comparison	8
268	dataset	8
269	erroneous	8
270	evaluation	8
271	f	8
272	fake	8
273	followed	8
274	following	8
275	future	8
276	goal	8
277	including	8
278	instance	8
279	internal	8
280	knowledge	8
281	line	8
282	med_qpigsm	8
283	messenger	8
284	much	8
285	product	8
286	prompting	8
287	simply	8
288	times	8
289	total	8
290	tuning	8
291	two	8
292	verify	8
293	wang	8
294	already	7
295	answer	7
296	arithmetic	7
297	compare	7
298	detect	7
299	detector	7
300	easily	7
301	first	7
302	generating	7
303	level	7
304	like	7
305	meta	7
306	needed	7
307	new	7
308	org	7
309	other	7
310	output	7
311	perform	7
312	physics	7
313	question	7
314	similar	7
315	thus	7
316	trader	7
317	trainable	7
318	via	7
319	zeyuan	7
320	adding	6
321	additionally	6
322	available	6
323	before	6
324	compute	6
325	dosample	6
326	especially	6
327	except	6
328	few	6
329	harder	6
330	here	6
331	immediately	6
332	important	6
333	knows	6
334	linear	6
335	low	6
336	making	6
337	might	6
338	most	6
339	n	6
340	note	6
341	operations	6
342	out	6
343	pear	6
344	perfectly	6
345	presented	6
346	qp	6
347	random	6
348	re	6
349	riverview	6
350	samples	6
351	seeds	6
352	significantly	6
353	states	6
354	sufficiently	6
355	tasks	6
356	teach	6
357	those	6
358	twice	6
359	update	6
360	value	6
361	why	6
362	work	6
363	ye	6
364	accurate	5
365	achieve	5
366	achieved	5
367	amount	5
368	arts	5
369	autoregressive	5
370	based	5
371	been	5
372	both	5
373	campus	5
374	cell	5
375	ch	5
376	choices	5
377	complexity	5
378	computations	5
379	computed	5
380	contains	5
381	controlled	5
382	correctness	5
383	counts	5
384	difficulties	5
385	discover	5
386	end	5
387	explore	5
388	fair	5
389	find	5
390	focus	5
391	grade	5
392	h	5
393	http	5
394	immediate	5
395	include	5
396	information	5
397	ingredient	5
398	instead	5
399	into	5
400	long	5
401	matrices	5
402	multi	5
403	necessary	5
404	needs	5
405	neural	5
406	observation	5
407	per	5
408	please	5
409	prepare	5
410	present	5
411	probability	5
412	provide	5
413	query	5
414	randomly	5
415	regretful	5
416	seasons	5
417	setting	5
418	shortest	5
419	should	5
420	show	5
421	shows	5
422	solve	5
423	some	5
424	statistics	5
425	these	5
426	through	5
427	token	5
428	type	5
429	up	5
430	us	5
431	verifier	5
432	were	5
433	works	5
434	yet	5
435	zhang	5
436	actually	4
437	advances	4
438	almost	4
439	another	4
440	any	4
441	association	4
442	attention	4
443	carefully	4
444	cases	4
445	changes	4
446	check	4
447	common	4
448	computation	4
449	computational	4
450	conclusion	4
451	consider	4
452	contrast	4
453	controllable	4
454	could	4
455	difference	4
456	due	4
457	easy	4
458	effective	4
459	exhibit	4
460	fewer	4
461	final	4
462	found	4
463	fully	4
464	given	4
465	gpu	4
466	graph	4
467	hidden	4
468	improvement	4
469	included	4
470	introduce	4
471	just	4
472	larger	4
473	lee	4
474	linguistics	4
475	liu	4
476	llama	4
477	longer	4
478	makes	4
479	matrix	4
480	medop	4
481	method	4
482	multiple	4
483	natural	4
484	observations	4
485	obtain	4
486	oh	4
487	originalunnecessary	4
488	pages	4
489	possible	4
490	pq	4
491	prints	4
492	proceedings	4
493	processing	4
494	range	4
495	ready	4
496	real	4
497	regenerate	4
498	requires	4
499	round	4
500	select	4
501	sense	4
502	shown	4
503	since	4
504	skip	4
505	statement	4
506	strong	4
507	systems	4
508	tells	4
509	them	4
510	then	4
511	though	4
512	towards	4
513	transformers	4
514	trivial	4
515	understanding	4
516	verification	4
517	vocational	4
518	weighs	4
519	whether	4
520	within	4
521	your	4
522	z	4
523	about	3
524	accurately	3
525	acl	3
526	adamw	3
527	add	3
528	again	3
529	alignment	3
530	allow	3
531	although	3
532	auto	3
533	automatically	3
534	banana	3
535	below	3
536	between	3
537	case	3
538	chance	3
539	change	3
540	choose	3
541	com	3
542	commercial	3
543	complements	3
544	correcting	3
545	cosine	3
546	count	3
547	creating	3
548	determine	3
549	did	3
550	differs	3
551	discard	3
552	down	3
553	eese	3
554	effectively	3
555	efficient	3
556	encourages	3
557	entire	3
558	evaluated	3
559	extreme	3
560	failure	3
561	finetuned	3
562	format	3
563	gain	3
564	general	3
565	go	3
566	half	3
567	hardest	3
568	hardpq	3
569	head	3
570	highly	3
571	https	3
572	ideally	3
573	identical	3
574	illustration	3
575	implies	3
576	improves	3
577	improving	3
578	indicating	3
579	intelligence	3
580	interestingly	3
581	intuitively	3
582	keep	3
583	least	3
584	let	3
585	life	3
586	lin	3
587	lu	3
588	maximum	3
589	methods	3
590	mlp	3
591	nearly	3
592	non	3
593	notably	3
594	observe	3
595	once	3
596	overall	3
597	pattern	3
598	peft	3
599	performance	3
600	performs	3
601	position	3
602	practice	3
603	prediction	3
604	previous	3
605	produce	3
606	prompts	3
607	qiang	3
608	randomness	3
609	rather	3
610	realize	3
611	reasoners	3
612	right	3
613	rows	3
614	sample	3
615	sampling	3
616	save	3
617	scheduling	3
618	sentences	3
619	set	3
620	shift	3
621	significant	3
622	simplify	3
623	sinternation	3
624	solving	3
625	specifically	3
626	st	3
627	stating	3
628	strikethrough	3
629	success	3
630	sufficient	3
631	supermarket	3
632	table	3
633	task	3
634	temperature	3
635	too	3
636	top	3
637	tried	3
638	truly	3
639	try	3
640	tunes	3
641	understand	3
642	unless	3
643	updates	3
644	versions	3
645	want	3
646	was	3
647	weizhu	3
648	will	3
649	writing	3
650	xu	3
651	yang	3
652	yu	3
653	zhou	3
654	above	2
655	abstract	2
656	achieving	2
657	acquire	2
658	acquired	2
659	actual	2
660	adaptation	2
661	adapted	2
662	against	2
663	algorithm	2
664	allowing	2
665	alters	2
666	anna	2
667	annual	2
668	appeared	2
669	appears	2
670	applied	2
671	apply	2
672	approaches	2
673	architecture	2
674	article	2
675	aspect	2
676	aspects	2
677	augment	2
678	auxiliary	2
679	avoid	2
680	aware	2
681	backpacks	2
682	becomes	2
683	begin	2
684	bei	2
685	ben	2
686	benefit	2
687	betas	2
688	bin	2
689	broken	2
690	bubeck	2
691	calculate	2
692	capability	2
693	capacity	2
694	checking	2
695	cmu	2
696	comes	2
697	comparable	2
698	complex	2
699	conclusions	2
700	conference	2
701	configurations	2
702	conjecture	2
703	consists	2
704	constructed	2
705	contained	2
706	corpus	2
707	correctly	2
708	corrects	2
709	correspond	2
710	cot	2
711	crucial	2
712	current	2
713	daypacks	2
714	decoder	2
715	decoding	2
716	decreased	2
717	deferred	2
718	definethefresh	2
719	demonstrate	2
720	denote	2
721	dependencyunused	2
722	depends	2
723	description	2
724	desirable	2
725	detected	2
726	detecting	2
727	difficulty	2
728	doesn	2
729	easier	2
730	efforts	2
731	either	2
732	eldan	2
733	eliminating	2
734	encourage	2
735	english	2
736	ensuring	2
737	eric	2
738	essentially	2
739	evidence	2
740	exact	2
741	examples	2
742	exploring	2
743	extremely	2
744	fei	2
745	finally	2
746	focuses	2
747	follow	2
748	framework	2
749	fu	2
750	gao	2
751	generalization	2
752	generates	2
753	generative	2
754	give	2
755	grammar	2
756	ground	2
757	gsm	2
758	guang	2
759	guide	2
760	guided	2
761	he	2
762	help	2
763	hierarchical	2
764	hig	2
765	hu	2
766	icecreamasp	2
767	idea	2
768	implement	2
769	implemented	2
770	incentivized	2
771	increase	2
772	increases	2
773	indicates	2
774	initial	2
775	insert	2
776	integers	2
777	interested	2
778	introduced	2
779	introduction	2
780	issue	2
781	iterative	2
782	jian	2
783	job	2
784	jun	2
785	karl	2
786	last	2
787	later	2
788	layers	2
789	leading	2
790	leads	2
791	learned	2
792	learns	2
793	less	2
794	lets	2
795	limit	2
796	llm	2
797	location	2
798	logic	2
799	lou	2
800	lower	2
801	lr	2
802	mann	2
803	manner	2
804	maskigsm	2
805	masks	2
806	mbzuai	2
807	meaning	2
808	means	2
809	medpq	2
810	meeting	2
811	michael	2
812	miller	2
813	multinomial	2
814	name	2
815	now	2
816	occurs	2
817	often	2
818	open	2
819	otherwise	2
820	outperform	2
821	pan	2
822	papers	2
823	paramterabstract	2
824	particularly	2
825	performed	2
826	perhaps	2
827	positional	2
828	potentially	2
829	practical	2
830	predict	2
831	predicting	2
832	prepared	2
833	promising	2
834	purposes	2
835	ramp	2
836	rarely	2
837	rates	2
838	realistic	2
839	reason	2
840	reasonable	2
841	recall	2
842	recently	2
843	regressive	2
844	reliably	2
845	remains	2
846	remove	2
847	require	2
848	research	2
849	resp	2
850	rewrite	2
851	rict	2
852	ronen	2
853	rotary	2
854	rounds	2
855	rumored	2
856	safe	2
857	say	2
858	sbananaask	2
859	sees	2
860	selecting	2
861	shifts	2
862	shizhuo	2
863	short	2
864	shot	2
865	showcase	2
866	simple	2
867	simulate	2
868	single	2
869	skills	2
870	slightly	2
871	smaller	2
872	solely	2
873	sometimes	2
874	sop	2
875	source	2
876	soy	2
877	special	2
878	stands	2
879	starting	2
880	strongly	2
881	structures	2
882	studies	2
883	study	2
884	summarize	2
885	support	2
886	sure	2
887	teaches	2
888	team	2
889	technique	2
890	tempting	2
891	tend	2
892	testing	2
893	text	2
894	therefore	2
895	three	2
896	throughout	2
897	tian	2
898	train	2
899	transformer	2
900	truth	2
901	tuned	2
902	typically	2
903	underlined	2
904	unlike	2
905	unlikely	2
906	until	2
907	url	2
908	usefulness	2
909	user	2
910	usually	2
911	various	2
912	verifiers	2
913	volume	2
914	well	2
915	wide	2
916	widely	2
917	window	2
918	words	2
919	workshop	2
920	world	2
921	wu	2
922	yifei	2
923	you	2
924	zeqi	2
925	zicheng	2

ひさびさにdocker(186)使う
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487
のawkで処理。

PDFからTXTにする際に、複合語が分離できていない課題がある。理由は不明。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

LLMはハルシネーションを自覚しているか 松尾研 LLM コミュニティ "Paper & Hacks Vol.24" AI(6)

Reference

Reference on the Reference

1 Kwangjun Ahn

2

3

単語帳

LLMはハルシネーションを自覚しているか　松尾研 LLM コミュニティ "Paper & Hacks Vol.24" AI(6)