LLM(Large Language Model) Advent Calendar 2024
https://qiita.com/advent-calendar/2024/llm
21日投稿予定の記事です。
松尾研 LLM コミュニティ "Paper & Hacks Vol.24"
https://matsuolab-community.connpass.com/event/335384/
発表者: 松尾研LLMコミュニティメンバー 下村 晃生(九州工業大学工学部宇宙システム工学科/松尾研大規模言語モデル講座第9回担当)
テーマ: LLMはハルシネーションを自覚しているか
論文リンク:Physics of Language Models: Part2.2 How to Learn From Mistakes on Grade-School Math Problems
https://arxiv.org/abs/2408.16293
自然言語処理をdocker(186)で
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487
参考文献の参考文献を一覧とするとともに単語帳を作る。(途中)
単語帳は、2回出現する単語までをここに記載している。1度だけの単語はDocker上にある。
<この項は書きかけです。順次追記します。>
This article is not completed. I will add some words and/or centences in order.
Reference
[1] Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra. Transformers learn to implement preconditioned gradient descent for in-context learning. Advances in Neural Information Processing Systems, 36, 2024.
https://arxiv.org/abs/2306.00297
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 1, Learning Hierarchical Language Structures. ArXiv e-prints, abs/2305.13673, May 2023. Full version available at http://arxiv.org/abs/2305.13673.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.2, Knowledge Manipulation. ArXiv e-prints, abs/2309.14402, September 2023. Full version available at http://arxiv.org/abs/2309.14402.
[4] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. In ICML, 2024. Full version available at http://arxiv.org/abs/2309.14316.
[5] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[6] Cem Anil, Yuhuai Wu, Anders Andreassen, Aitor Lewkowycz, Vedant Misra, Vinay Ramasesh, Ambrose Slone, Guy Gur-Ari, Ethan Dyer, and Behnam Neyshabur. Exploring length generalization in large language models. Advances in Neural Information Processing Systems, 35:38546–38556, 2022.
[7] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[8] S ́ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
[9] Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021.
[10] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. In ICLR, 2021.
[11] Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, and Denny Zhou. Large language models cannot self-correct reasoning yet. arXiv preprint arXiv:2310.01798, 2023.
[12] Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. Making language models better reasoners with step-aware verifier. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors, Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, Toronto, Canada, July 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.acl-long.291. URL https://aclanthology.org/ 2023.acl-long.291.
[13] Yifei Li, Zeqi Lin, Shizhuo Zhang, Qiang Fu, Bei Chen, Jian-Guang Lou, and Weizhu Chen. Making language models better reasoners with step-aware verifier. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5315–5333, 2023.
[14] Bingbin Liu, Sebastien Bubeck, Ronen Eldan, Janardhan Kulkarni, Yuanzhi Li, Anh Nguyen, Rachel Ward, and Yi Zhang. TinyGSM: achieving > 80% on GSM8k with small language models. arXiv preprint arXiv:2312.09241, 2023.
[15] Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al. Self-refine: Iterative refinement with self- feedback. Advances in Neural Information Processing Systems, 36, 2024.
[16] Marah Abdin et al. Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.
[17] John Miller, Karl Krauth, Benjamin Recht, and Ludwig Schmidt. The effect of natural distribution shift on question answering models. In International conference on machine learning, pages 6905–6916. PMLR, 2020.
[18] Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
[19] Liangming Pan, Michael Saxon, Wenda Xu, Deepak Nathani, Xinyi Wang, and William Yang Wang. Automatically correcting large language models: Surveying the landscape of diverse self-correction strategies. arXiv preprint arXiv:2308.03188, 2023.
[20] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019.
[21] Maithra Raghu, Thomas Unterthiner, Simon Kornblith, Chiyuan Zhang, and Alexey Dosovitskiy. Do vision transformers see like convolutional neural networks? Advances in neural information processing systems, 34:12116–12128, 2021.
[22] Yiheng Shu and Zhiwei Yu. Distribution shifts are bottlenecks: Extensive evaluation for grounding language models to knowledge bases. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: Student Research Workshop, pages 71–88, 2024.
[23] Marta Skreta, Naruki Yoshikawa, Sebastian Arellano-Rubach, Zhi Ji, Lasse Bjørn Kristensen, Kourosh Darvish, Al ́an Aspuru-Guzik, Florian Shkurti, and Animesh Garg. Errors are useful prompts: Instruction guided task programming with verifier-assisted iterative prompting. arXiv preprint arXiv:2303.14100, 2023.
[24] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021.
[25] Llama Team. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
[26] Fei Wang, Chao Shang, Sarthak Jain, Shuai Wang, Qiang Ning, Bonan Min, Vittorio Castelli, Yassine Benajiba, and Dan Roth. From instructions to constraints: Language model alignment with automatic constraint verification. arXiv preprint arXiv:2403.06326, 2024.
[27] Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Shengping Liu, Bin Sun, Kang Liu, and Jun Zhao. Large language models are better reasoners with self-verification. arXiv preprint arXiv:2212.09561, 2022.
[28] Kaiyu Yang, Jia Deng, and Danqi Chen. Generating natural language proofs with verifier-guided search. arXiv preprint arXiv:2205.12443, 2022.
[29] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade- School Math and the Hidden Reasoning Process. arXiv e-prints, abs/2407.20311, 2024. Full version available at http://arxiv.org/abs/2407.20311.
[30] Yunxiang Zhang, Muhammad Khalifa, Lajanugen Logeswaran, Jaekyeom Kim, Moontae Lee, Honglak Lee, and Lu Wang. Small language models need strong verifiers to self-correct reasoning. arXiv preprint arXiv:2404.17140, 2024.
[31] Hattie Zhou, Arwen Bradley, Etai Littwin, Noam Razin, Omid Saremi, Josh Susskind, Samy Bengio, and Preetum Nakkiran. What algorithms can transformers learn? a study in length generalization. arXiv preprint arXiv:2310.16028, 2023.
Reference on the Reference
1 Kwangjun Ahn
Kwangjun Ahn, Xiang Cheng, Minhak Song, Chulhee Yun, Ali Jadbabaie, and Suvrit Sra. Lin- ear attention is (maybe) all you need (to understand transformer optimization). arXiv preprint arXiv:2310.01082, 2023.
Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? investigations with linear models. International Conference on Learning Representations, 2022.
Zeyuan Allen-Zhu and Yuanzhi Li. Physics of language models: Part 1, context-free grammar. arXiv preprint arXiv:2305.13673, 2023.
Noga Alon and Joel H Spencer. The probabilistic method. John Wiley & Sons, 2016.
Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, et al. Gpt-neox-20b: An open-source autoregressive language model. Proceedings of BigScience – Workshop on Challenges & Perspectives in Creating Large Language Models, 2022.
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. Neural Information Processing Systems, 2020.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1, 2019.
John Duchi, Elad Hazan, and Yoram Singer. Adaptive subgradient methods for online learning and stochastic optimization. Journal of machine learning research, 12(7), 2011.
Benjamin L Edelman, Surbhi Goel, Sham Kakade, and Cyril Zhang. Inductive biases and variable creation in self-attention mechanisms. In International Conference on Machine Learning (ICML), 2022.
Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A mathematical framework for transformer circuits. Transformer Circuits Thread, 2021. https://transformer-circuits.pub/2021/framework/index.html.
Murat A Erdogdu, Lee H Dicker, and Mohsen Bayati. Scaled least squares estimator for glms in large-scale problems. Advances in Neural Information Processing Systems, 29, 2016.
Shivam Garg, Dimitris Tsipras, Percy S Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes. Advances in Neural Information Processing Systems, 35:30583–30598, 2022.
Angeliki Giannou, Shashank Rajput, Jy-yong Sohn, Kangwook Lee, Jason D Lee, and Dimitris Pa- pailiopoulos. Looped transformers as programmable computers. arXiv preprint arXiv:2301.13196, 2023.
Alex Graves, Greg Wayne, and Ivo Danihelka. Neural turing machines. arXiv preprint arXiv:1410.5401, 2014.
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 1997.
Stanisław Jastrzebski, Devansh Arpit, Nicolas Ballas, Vikas Verma, Tong Che, and Yoshua Bengio. Residual connections encourage iterative inference. In International Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=SJa9iHgAZ.
Ke Li and Jitendra Malik. Learning to optimize. In International Conference on Learning Represen- tations, 2017.
11
Opher Lieber, Or Sharir, Barak Lenz, and Yoav Shoham. Jurassic-1: Technical details and evaluation. White Paper. AI21 Labs, 2021.
Arvind Mahankali, Tatsunori B Hashimoto, and Tengyu Ma. One step of gradient descent is provably the optimal in-context learner with one layer of linear self-attention. arXiv preprint arXiv:2307.03576, 2023.
Sewon Min, Mike Lewis, Luke Zettlemoyer, and Hannaneh Hajishirzi. Metaicl: Learning to learn in context. Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2021.
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. In-context learning and induction heads. Transformer Circuits Thread, 2022.
Jorge Pérez, Pablo Barceló, and Javier Marinkovic. Attention is turing complete. The Journal of Machine Learning Research, 2021.
Jack W Rae, Sebastian Borgeaud, Trevor Cai, Katie Millican, Jordan Hoffmann, Francis Song, John Aslanides, Sarah Henderson, Roman Ring, Susannah Young, et al. Scaling language models: Methods, analysis & insights from training gopher. arXiv preprint arXiv:2112.11446, 2021.
Imanol Schlag, Kazuki Irie, and Jürgen Schmidhuber. Linear transformers are secretly fast weight programmers. In International Conference on Machine Learning, pages 9355–9366. PMLR, 2021.
Hava T Siegelmann and Eduardo D Sontag. On the computational power of neural nets. In Proceedings of Workshop on Computational learning theory, 1992.
Vladimir Vapnik. The nature of statistical learning theory. Springer science & business media, 1999.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 2017.
Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. In International Conference on Machine Learning, pages 35151–35174. PMLR, 2023.
Colin Wei, Yining Chen, and Tengyu Ma. Statistically meaningful approximation: a case study on approximating turing machines with transformers. Advances in Neural Information Processing Systems, 35:12071–12083, 2022.
Mitchell Wortsman, Jaehoon Lee, Justin Gilmer, and Simon Kornblith. Replacing softmax with relu in vision transformers. arXiv preprint arXiv:2309.08586, 2023.
Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference. International Conference on Learning Representations, 2021.
Ruiqi Zhang, Spencer Frei, and Peter L Bartlett. Trained transformers learn linear models in-context. arXiv preprint arXiv:2306.09927, 2023.
Haoyu Zhao, Abhishek Panigrahi, Rong Ge, and Sanjeev Arora. Do transformers parse while predicting the masked word? arXiv preprint arXiv:2303.08117, 2023.
2
[1] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. ArXiv e-prints, abs/2309.14316, September 2023. Full version available at http://arxiv. org/abs/2309.14316.
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.2, Knowledge Manipulation. ArXiv e-prints, abs/2309.14402, September 2023. Full version available at http://arxiv.org/abs/ 2309.14402.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Backward feature correction: How deep learning performs deep learning. In COLT, 2023. Full version available at http://arxiv.org/abs/2001.04413.
[4] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[5] Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via over- parameterization. In ICML, 2019. Full version available at http://arxiv.org/abs/1811.03962.
[6] Sanjeev Arora and Yi Zhang. Do gans actually learn the distribution? an empirical study. arXiv preprint arXiv:1706.08224, 2017.
[7] David Arps, Younes Samih, Laura Kallmeyer, and Hassan Sajjad. Probing for constituency structure in neural language models. arXiv preprint arXiv:2204.06201, 2022.
[8] James K Baker. Trainable grammars for speech recognition. The Journal of the Acoustical Society of America, 65(S1):S132–S132, 1979.
[9] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[10] Gregoire Deletang, Anian Ruoss, Jordi Grau-Moya, Tim Genewein, Li Kevin Wenliang, Elliot Catt, Chris Cundy, Marcus Hutter, Shane Legg, Joel Veness, et al. Neural networks and the chomsky hierar- chy. In ICLR, 2023.
[11] Brian DuSell and David Chiang. Learning hierarchical structures with differentiable nondeterministic stacks. In ICLR, 2022.
[12] Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1, 2021.
[13] Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654, 2020.
[14] John Hewitt and Christopher D. Manning. A structural probe for finding syntax in word represen- tations. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4129–4138, Minneapolis, Minnesota, June 2019. Association for Computational Linguistics. doi: 10.18653/v1/N19-1419. URL https://aclanthology.org/N19-1419.
[15] Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidi- rectional transformers for language understanding. In Proceedings of NAACL-HLT, pages 4171–4186, 2019.
[16] Christopher D Manning, Kevin Clark, John Hewitt, Urvashi Khandelwal, and Omer Levy. Emergent linguistic structure in artificial neural networks trained by self-supervision. Proceedings of the National Academy of Sciences, 117(48):30046–30054, 2020.
[17] Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. URL https: //aclanthology.org/J93-2004.
[18] Rowan Hall Maudslay and Ryan Cotterell. Do syntactic probes probe syntax? experiments with jabberwocky probing. arXiv preprint arXiv:2106.02559, 2021.
[19] Milad Moradi and Matthias Samwald. Evaluating the robustness of neural language models to input perturbations. arXiv preprint arXiv:2108.12237, 2021.
[20] Shikhar Murty, Pratyusha Sharma, Jacob Andreas, and Christopher D Manning. Characterizing intrin- sic compositionality in transformers with tree projections. In ICLR, 2023.
[21] Neel Nanda, Lawrence Chan, Tom Liberum, Jess Smith, and Jacob Steinhardt. Progress measures for grokking via mechanistic interpretability. arXiv preprint arXiv:2301.05217, 2023.
[22] Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads. arXiv preprint arXiv:2209.11895, 2022.
[23] OpenAI. Gpt-4 technical report, 2023.
[24] Matt Post and Shane Bergsma. Explicit and implicit syntactic features for text classification. In
Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 866–872, 2013.
[25] Alec Radford, Jeff Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019.
[26] Itiroo Sakai. Syntax in universal translation. In Proceedings of the International Conference on Machine Translation and Applied Language Analysis, 1961.
[27] Hui Shi, Sicun Gao, Yuandong Tian, Xinyun Chen, and Jishen Zhao. Learning bounded context-free- grammar via lstm and the transformer: Difference and the explanations. In Proceedings of the AAAI
[28] Michael Sipser. Introduction to the Theory of Computation. Cengage Learning, 2012.
[29] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021.
[30] Lifu Tu, Garima Lalwani, Spandana Gella, and He He. An empirical study on robustness to spurious correlations using pre-trained language models. Transactions of the Association for Computational Linguistics, 8:621–633, 2020.
[31] David Vilares, Michalina Strzyz, Anders Søgaard, and Carlos G ́omez-Rodr ́ıguez. Parsing as pretraining. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 9114–9121, 2020.
[32] KevinWang, AlexandreVariengien, ArthurConmy, BuckShlegeris,and Jacob Steinhardt. Interpretability in the wild: a circuit for indirect object identification in gpt-2 small. arXiv preprint arXiv:2211.00593, 2022.
[33] Zhiyong Wu, Yun Chen, Ben Kao, and Qun Liu. Perturbed masking: Parameter-free probing for analyzing and interpreting bert. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4166–4176, 2020.
[34] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process. arXiv preprint, 2024. to appear.
[35] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems. arXiv preprint, 2024. to appear.
[36] Shizhuo Dylan Zhang, Curt Tigges, Stella Biderman, Maxim Raginsky, and Talia Ringer. Can trans- formers learn to solve problems recursively? arXiv preprint arXiv:2305.14699, 2023.
[37] Haoyu Zhao, Abhishek Panigrahi, Rong Ge, and Sanjeev Arora. Do transformers parse while predicting the masked word? arXiv preprint arXiv:2303.08117, 2023.
3
[1] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 1, Learning Hierarchical Language Structures. ArXiv e-prints, abs/2305.13673, May 2023. Full version available at http://arxiv.org/ abs/2305.13673.
[2] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.1, Knowledge Storage and Extraction. In ICML, 2024. Full version available at http://arxiv.org/abs/2309.14316.
[3] Zeyuan Allen-Zhu and Yuanzhi Li. Physics of Language Models: Part 3.3, Knowledge Capacity Scaling Laws. ArXiv e-prints, abs/2404.05405, April 2024. Full version available at http://arxiv.org/abs/ 2404.05405.
[4] Lukas Berglund, Meg Tong, Max Kaufmann, Mikita Balesni, Asa Cooper Stickland, Tomasz Korbak, and Owain Evans. The Reversal Curse: LLMs trained on ”A is B” fail to learn ”B is A”. arXiv preprint arXiv:2309.12288, September 2023.
[5] Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. GPT-NeoX-20B: An open-source autoregressive language model. In Proceedings of the ACL Workshop on Challenges & Perspectives in Creating Large Language Models, 2022. URL https://arxiv.org/abs/2204.06745.
[6] Deng Cai, Yan Wang, Lemao Liu, and Shuming Shi. Recent advances in retrieval-augmented text gen- eration. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 3417–3419, 2022.
[7] Mor Geva, Daniel Khashabi, Elad Segal, Tushar Khot, Dan Roth, and Jonathan Berant. Did aristotle use a laptop? a question answering benchmark with implicit reasoning strategies. Transactions of the Association for Computational Linguistics, 9:346–361, 2021.
[8] Fabian Gloeckle, Badr Youbi Idrissi, Baptiste Roziere, David Lopez-Paz, and Gabriel Synnaeve. Better & faster large language models via multi-token prediction. arXiv preprint arXiv:2404.19737, 2024. [9] Olga Golovneva, Zeyuan Allen-Zhu, Jason Weston, and Sainbayar Sukhbaatar. Reverse training to nurse the reversal curse. arXiv preprint arXiv:2403.13799, 2024. [10] Qingyan Guo, Rui Wang, Junliang Guo, Xu Tan, Jiang Bian, and Yujiu Yang. Mitigating reversal curse via semantic-aware permutation training. arXiv preprint arXiv:2403.00758, 2024. [11] Evan Hernandez, Belinda Z Li, and Jacob Andreas. Measuring and manipulating knowledge represen- tations in language models. arXiv preprint arXiv:2304.00740, 2023. [12] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. LoRA: Low-Rank Adaptation of Large Language Models. In ICLR, 2021. [13] AlbertQJiang,AlexandreSablayrolles, ArthurMensch, ChrisBamford, DevendraSinghChaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. [14] Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. arXiv preprint arXiv:2305.06983, 2023. [15] Mojtaba Komeili, Kurt Shuster, and Jason Weston. Internet-augmented dialogue generation. arXiv preprint arXiv:2107.07566, 2021. [16] R ́emi Lebret, David Grangier, and Michael Auli. Generating text from structured data with application to the biography domain. CoRR, abs/1603.07771, 2016. URL http://arxiv.org/abs/1603.07771. [17] Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Ku ̈ttler, Mike Lewis, Wen-tau Yih, Tim Rockt ̈aschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive nlp tasks. In H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 9459–9474. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/ paper_files/paper/2020/file/6b493230205f780e1bc26945df7481e5-Paper.pdf. [18] Shangqing Liu, Yu Chen, Xiaofei Xie, Jingkai Siow, and Yang Liu. Retrieval-augmented generation for code summarization via hybrid gnn. arXiv preprint arXiv:2006.05405, 2020. [19] Yuning Mao, Pengcheng He, Xiaodong Liu, Yelong Shen, Jianfeng Gao, Jiawei Han, and Weizhu Chen. Generation-augmented retrieval for open-domain question answering. arXiv preprint arXiv:2009.08553, 2020. [20] Tahira Naseem, Srinivas Ravishankar, Nandana Mihindukulasooriya, Ibrahim Abdelaziz, Young-Suk Lee, Pavan Kapanipathi, Salim Roukos, Alfio Gliozzo, and Alexander Gray. A semantics-aware trans- former model of relation linking for knowledge base question answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 256–262, Online, August 2021. Association for Computational Linguistics. [21] Anh Nguyen, Nikos Karampatziakis, and Weizhu Chen. Meet in the middle: A new pre-training paradigm. Advances in Neural Information Processing Systems, 36, 2024. [22] Reham Omar, Omij Mangukiya, Panos Kalnis, and Essam Mansour. Chatgpt versus traditional ques- tion answering for knowledge graphs: Current status and future directions towards knowledge graph chatbots. arXiv preprint arXiv:2302.06466, 2023. [23] OpenAI. Gpt-4 technical report, 2023. [24] Md Rizwan Parvez, Wasi Uddin Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. Retrieval augmented code generation and summarization. arXiv preprint arXiv:2108.11601, 2021. [25] Hao Peng, Xiaozhi Wang, Shengding Hu, Hailong Jin, Lei Hou, Juanzi Li, Zhiyuan Liu, and Qun Liu. Copen: Probing conceptual knowledge in pre-trained language models. arXiv preprint arXiv:2211.04079, 2022. [26] Fabio Petroni, Tim Rockt ̈aschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H Miller, and Sebastian Riedel. Language models as knowledge bases? arXiv preprint arXiv:1909.01066, 2019. [27] Jacob Pfau, William Merrill, and Samuel R Bowman. Let’s think dot by dot: Hidden computation in transformer language models. arXiv preprint arXiv:2404.15758, 2024. [28] Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. [29] OriRam,YoavLevine,ItayDalmedigos,DorMuhlgay,AmnonShashua,KevinLeyton-Brown,andYoav Shoham. In-context retrieval-augmented language models. arXiv preprint arXiv:2302.00083, 2023. [30] Kyle Richardson and Ashish Sabharwal. What does my QA model know? devising controlled probes using expert knowledge. Transactions of the Association for Computational Linguistics, 8:572–588, 2020. doi: 10.1162/tacl a 00331. URL https://aclanthology.org/2020.tacl-1.37. [31] Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022. [32] Shamane Siriwardhana, Rivindu Weerasekera, Elliott Wen, Tharindu Kaluarachchi, Rajib Rana, and Suranga Nanayakkara. Improving the domain adaptation of retrieval augmented generation (rag) models for open domain question answering. Transactions of the Association for Computational Linguistics, 11:1–17, 2023. [33] Jianlin Su, Yu Lu, Shengfeng Pan, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding, 2021. [34] Kai Sun, Yifan Ethan Xu, Hanwen Zha, Yue Liu, and Xin Luna Dong. Head-to-tail: How knowl- edgeable are large language models (llm)? aka will llms replace knowledge graphs? arXiv preprint arXiv:2308.10168, 2023. [35] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth ́ee Lacroix, Baptiste Rozi
ere, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[36] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
[37] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process. arXiv preprint arXiv:xxxx.xxxxx, 2024. to appear.
[38] Tian Ye, Zicheng Xu, Yuanzhi Li, and Zeyuan Allen-Zhu. Physics of Language Models: Part 2.2, How to Learn From Mistakes on Grade-School Math Problems. arXiv preprint arXiv:xxxx.xxxxx, 2024. to appear.
[39] Chunting Zhou, Pengfei Liu, Puxin Xu, Srini Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment. arXiv preprint arXiv:2305.11206, 2023.
単語帳
単語帳の作成目的は、英語論文を読む際に、頻出単語上位で知らない単語がないことを確認するためでした。
国際規格の翻訳業務をしていたときには、予定訳語を想定し、同じ文脈では同じ訳語を使うための基準として作成していました。
機械翻訳ソフトの開発に参加してからは、機械翻訳ソフトの精度確認のため、予定訳語を想定し、文脈上どのように訳しているかを確認するために利用していました。
LLM論文の翻訳はあまりしておらず、今回は、まだ訳語はつけていません。何本か単語帳を作ったら、100後から1000語くらいまでは訳語をつけようかなって思っています。
PDFファイルで保存してそのままTXTファイルにしています。論文内部ではない単語も混ざっています。ごめんなさい。経験上1%を超えないことと、定型的なものがあり論文共通の文字列だったりすることがあります。誤差ということでご勘弁。
自然言語処理をdocker(186)で
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487
次回からは
docker run -v /tmp/llm:/rmp/llm -it kaizenjapan/llm /bin/bash
term | count | |
---|---|---|
1 | the | 540 |
2 | retry | 329 |
3 | to | 252 |
4 | of | 249 |
5 | a | 237 |
6 | s | 221 |
7 | with | 210 |
8 | op | 202 |
9 | and | 199 |
10 | in | 182 |
11 | we | 165 |
12 | data | 160 |
13 | retryrate | 160 |
14 | is | 141 |
15 | for | 140 |
16 | model | 131 |
17 | each | 123 |
18 | as | 119 |
19 | on | 117 |
20 | this | 116 |
21 | can | 103 |
22 | it | 99 |
23 | error | 90 |
24 | e | 88 |
25 | lora | 82 |
26 | that | 81 |
27 | finetune | 80 |
28 | figure | 76 |
29 | box | 70 |
30 | define | 70 |
31 | unnecessary | 69 |
32 | models | 67 |
33 | language | 63 |
34 | pretrain | 60 |
35 | so | 60 |
36 | number | 59 |
37 | qv | 57 |
38 | igsm | 52 |
39 | mask | 52 |
40 | not | 52 |
41 | be | 49 |
42 | correct | 49 |
43 | from | 49 |
44 | pretrained | 49 |
45 | rate | 48 |
46 | accuracy | 47 |
47 | b | 47 |
48 | mistakes | 47 |
49 | such | 47 |
50 | using | 45 |
51 | cheese | 44 |
52 | market | 44 |
53 | or | 42 |
54 | param | 41 |
55 | solution | 41 |
56 | beam | 40 |
57 | math | 40 |
58 | studio | 39 |
59 | arxiv | 38 |
60 | maskretryrate | 38 |
61 | training | 38 |
62 | its | 37 |
63 | are | 36 |
64 | free | 35 |
65 | pretraining | 35 |
66 | parameters | 34 |
67 | reasoning | 34 |
68 | equals | 33 |
69 | has | 33 |
70 | if | 33 |
71 | steps | 33 |
72 | also | 32 |
73 | errors | 32 |
74 | problems | 32 |
75 | retry_weak | 32 |
76 | solutions | 32 |
77 | even | 31 |
78 | an | 30 |
79 | film | 30 |
80 | high | 30 |
81 | upon | 29 |
82 | use | 29 |
83 | regret | 28 |
84 | school | 28 |
85 | g | 27 |
86 | gpt | 27 |
87 | have | 27 |
88 | x | 27 |
89 | r | 26 |
90 | tokens | 26 |
91 | when | 26 |
92 | by | 25 |
93 | jim | 25 |
94 | probing | 25 |
95 | section | 25 |
96 | w | 25 |
97 | finetuning | 24 |
98 | next | 24 |
99 | only | 24 |
100 | process | 24 |
101 | see | 24 |
102 | correction | 23 |
103 | generation | 23 |
104 | learning | 23 |
105 | original | 23 |
106 | result | 23 |
107 | same | 23 |
108 | version | 23 |
109 | after | 22 |
110 | c | 22 |
111 | dist | 22 |
112 | i | 22 |
113 | international | 22 |
114 | parmesan | 22 |
115 | than | 22 |
116 | weight | 22 |
117 | at | 21 |
118 | but | 21 |
119 | cream | 21 |
120 | full | 21 |
121 | jungle | 21 |
122 | mistake | 21 |
123 | our | 21 |
124 | reask | 21 |
125 | backpack | 20 |
126 | district | 20 |
127 | ice | 20 |
128 | inside | 20 |
129 | ood | 20 |
130 | p | 20 |
131 | parameter | 19 |
132 | problem | 19 |
133 | rank | 19 |
134 | results | 19 |
135 | sentence | 19 |
136 | step | 19 |
137 | their | 19 |
138 | time | 19 |
139 | used | 19 |
140 | back | 18 |
141 | distout | 18 |
142 | many | 18 |
143 | more | 18 |
144 | one | 18 |
145 | own | 18 |
146 | central | 17 |
147 | example | 17 |
148 | paper | 17 |
149 | still | 17 |
150 | v | 17 |
151 | which | 17 |
152 | all | 16 |
153 | during | 16 |
154 | grape | 16 |
155 | make | 16 |
156 | no | 16 |
157 | pounds | 16 |
158 | u | 16 |
159 | al | 15 |
160 | d | 15 |
161 | detection | 15 |
162 | how | 15 |
163 | improve | 15 |
164 | k | 15 |
165 | masking | 15 |
166 | retry_miss | 15 |
167 | they | 15 |
168 | context | 14 |
169 | continued | 14 |
170 | datasets | 14 |
171 | decay | 14 |
172 | directly | 14 |
173 | does | 14 |
174 | hard | 14 |
175 | l | 14 |
176 | length | 14 |
177 | part | 14 |
178 | preprint | 14 |
179 | q | 14 |
180 | small | 14 |
181 | weak | 14 |
182 | while | 14 |
183 | appendix | 13 |
184 | corrections | 13 |
185 | fine | 13 |
186 | fresh | 13 |
187 | generate | 13 |
188 | learn | 13 |
189 | li | 13 |
190 | search | 13 |
191 | size | 13 |
192 | synthetic | 13 |
193 | trained | 13 |
194 | wrong | 13 |
195 | y | 13 |
196 | best | 12 |
197 | daypack | 12 |
198 | double | 12 |
199 | label | 12 |
200 | made | 12 |
201 | med | 12 |
202 | pineapple | 12 |
203 | self | 12 |
204 | there | 12 |
205 | very | 12 |
206 | accuracies | 11 |
207 | batch | 11 |
208 | better | 11 |
209 | cannot | 11 |
210 | details | 11 |
211 | et | 11 |
212 | experiments | 11 |
213 | higher | 11 |
214 | however | 11 |
215 | o | 11 |
216 | over | 11 |
217 | perfect | 11 |
218 | test | 11 |
219 | FALSE | 11 |
220 | abs | 10 |
221 | chen | 10 |
222 | college | 10 |
223 | compared | 10 |
224 | different | 10 |
225 | distribution | 10 |
226 | embedding | 10 |
227 | experiment | 10 |
228 | goat | 10 |
229 | inference | 10 |
230 | joe | 10 |
231 | may | 10 |
232 | miss | 10 |
233 | need | 10 |
234 | residential | 10 |
235 | retries | 10 |
236 | skill | 10 |
237 | t | 10 |
238 | uses | 10 |
239 | vs | 10 |
240 | weights | 10 |
241 | where | 10 |
242 | without | 10 |
243 | yuanzhi | 10 |
244 | among | 9 |
245 | approach | 9 |
246 | call | 9 |
247 | dance | 9 |
248 | do | 9 |
249 | generated | 9 |
250 | j | 9 |
251 | large | 9 |
252 | layer | 9 |
253 | llms | 9 |
254 | med_pqigsm | 9 |
255 | stage | 9 |
256 | sum | 9 |
257 | versionp | 9 |
258 | what | 9 |
259 | would | 9 |
260 | zhu | 9 |
261 | TRUE | 9 |
262 | additional | 8 |
263 | allen | 8 |
264 | average | 8 |
265 | because | 8 |
266 | comparing | 8 |
267 | comparison | 8 |
268 | dataset | 8 |
269 | erroneous | 8 |
270 | evaluation | 8 |
271 | f | 8 |
272 | fake | 8 |
273 | followed | 8 |
274 | following | 8 |
275 | future | 8 |
276 | goal | 8 |
277 | including | 8 |
278 | instance | 8 |
279 | internal | 8 |
280 | knowledge | 8 |
281 | line | 8 |
282 | med_qpigsm | 8 |
283 | messenger | 8 |
284 | much | 8 |
285 | product | 8 |
286 | prompting | 8 |
287 | simply | 8 |
288 | times | 8 |
289 | total | 8 |
290 | tuning | 8 |
291 | two | 8 |
292 | verify | 8 |
293 | wang | 8 |
294 | already | 7 |
295 | answer | 7 |
296 | arithmetic | 7 |
297 | compare | 7 |
298 | detect | 7 |
299 | detector | 7 |
300 | easily | 7 |
301 | first | 7 |
302 | generating | 7 |
303 | level | 7 |
304 | like | 7 |
305 | meta | 7 |
306 | needed | 7 |
307 | new | 7 |
308 | org | 7 |
309 | other | 7 |
310 | output | 7 |
311 | perform | 7 |
312 | physics | 7 |
313 | question | 7 |
314 | similar | 7 |
315 | thus | 7 |
316 | trader | 7 |
317 | trainable | 7 |
318 | via | 7 |
319 | zeyuan | 7 |
320 | adding | 6 |
321 | additionally | 6 |
322 | available | 6 |
323 | before | 6 |
324 | compute | 6 |
325 | dosample | 6 |
326 | especially | 6 |
327 | except | 6 |
328 | few | 6 |
329 | harder | 6 |
330 | here | 6 |
331 | immediately | 6 |
332 | important | 6 |
333 | knows | 6 |
334 | linear | 6 |
335 | low | 6 |
336 | making | 6 |
337 | might | 6 |
338 | most | 6 |
339 | n | 6 |
340 | note | 6 |
341 | operations | 6 |
342 | out | 6 |
343 | pear | 6 |
344 | perfectly | 6 |
345 | presented | 6 |
346 | qp | 6 |
347 | random | 6 |
348 | re | 6 |
349 | riverview | 6 |
350 | samples | 6 |
351 | seeds | 6 |
352 | significantly | 6 |
353 | states | 6 |
354 | sufficiently | 6 |
355 | tasks | 6 |
356 | teach | 6 |
357 | those | 6 |
358 | twice | 6 |
359 | update | 6 |
360 | value | 6 |
361 | why | 6 |
362 | work | 6 |
363 | ye | 6 |
364 | accurate | 5 |
365 | achieve | 5 |
366 | achieved | 5 |
367 | amount | 5 |
368 | arts | 5 |
369 | autoregressive | 5 |
370 | based | 5 |
371 | been | 5 |
372 | both | 5 |
373 | campus | 5 |
374 | cell | 5 |
375 | ch | 5 |
376 | choices | 5 |
377 | complexity | 5 |
378 | computations | 5 |
379 | computed | 5 |
380 | contains | 5 |
381 | controlled | 5 |
382 | correctness | 5 |
383 | counts | 5 |
384 | difficulties | 5 |
385 | discover | 5 |
386 | end | 5 |
387 | explore | 5 |
388 | fair | 5 |
389 | find | 5 |
390 | focus | 5 |
391 | grade | 5 |
392 | h | 5 |
393 | http | 5 |
394 | immediate | 5 |
395 | include | 5 |
396 | information | 5 |
397 | ingredient | 5 |
398 | instead | 5 |
399 | into | 5 |
400 | long | 5 |
401 | matrices | 5 |
402 | multi | 5 |
403 | necessary | 5 |
404 | needs | 5 |
405 | neural | 5 |
406 | observation | 5 |
407 | per | 5 |
408 | please | 5 |
409 | prepare | 5 |
410 | present | 5 |
411 | probability | 5 |
412 | provide | 5 |
413 | query | 5 |
414 | randomly | 5 |
415 | regretful | 5 |
416 | seasons | 5 |
417 | setting | 5 |
418 | shortest | 5 |
419 | should | 5 |
420 | show | 5 |
421 | shows | 5 |
422 | solve | 5 |
423 | some | 5 |
424 | statistics | 5 |
425 | these | 5 |
426 | through | 5 |
427 | token | 5 |
428 | type | 5 |
429 | up | 5 |
430 | us | 5 |
431 | verifier | 5 |
432 | were | 5 |
433 | works | 5 |
434 | yet | 5 |
435 | zhang | 5 |
436 | actually | 4 |
437 | advances | 4 |
438 | almost | 4 |
439 | another | 4 |
440 | any | 4 |
441 | association | 4 |
442 | attention | 4 |
443 | carefully | 4 |
444 | cases | 4 |
445 | changes | 4 |
446 | check | 4 |
447 | common | 4 |
448 | computation | 4 |
449 | computational | 4 |
450 | conclusion | 4 |
451 | consider | 4 |
452 | contrast | 4 |
453 | controllable | 4 |
454 | could | 4 |
455 | difference | 4 |
456 | due | 4 |
457 | easy | 4 |
458 | effective | 4 |
459 | exhibit | 4 |
460 | fewer | 4 |
461 | final | 4 |
462 | found | 4 |
463 | fully | 4 |
464 | given | 4 |
465 | gpu | 4 |
466 | graph | 4 |
467 | hidden | 4 |
468 | improvement | 4 |
469 | included | 4 |
470 | introduce | 4 |
471 | just | 4 |
472 | larger | 4 |
473 | lee | 4 |
474 | linguistics | 4 |
475 | liu | 4 |
476 | llama | 4 |
477 | longer | 4 |
478 | makes | 4 |
479 | matrix | 4 |
480 | medop | 4 |
481 | method | 4 |
482 | multiple | 4 |
483 | natural | 4 |
484 | observations | 4 |
485 | obtain | 4 |
486 | oh | 4 |
487 | originalunnecessary | 4 |
488 | pages | 4 |
489 | possible | 4 |
490 | pq | 4 |
491 | prints | 4 |
492 | proceedings | 4 |
493 | processing | 4 |
494 | range | 4 |
495 | ready | 4 |
496 | real | 4 |
497 | regenerate | 4 |
498 | requires | 4 |
499 | round | 4 |
500 | select | 4 |
501 | sense | 4 |
502 | shown | 4 |
503 | since | 4 |
504 | skip | 4 |
505 | statement | 4 |
506 | strong | 4 |
507 | systems | 4 |
508 | tells | 4 |
509 | them | 4 |
510 | then | 4 |
511 | though | 4 |
512 | towards | 4 |
513 | transformers | 4 |
514 | trivial | 4 |
515 | understanding | 4 |
516 | verification | 4 |
517 | vocational | 4 |
518 | weighs | 4 |
519 | whether | 4 |
520 | within | 4 |
521 | your | 4 |
522 | z | 4 |
523 | about | 3 |
524 | accurately | 3 |
525 | acl | 3 |
526 | adamw | 3 |
527 | add | 3 |
528 | again | 3 |
529 | alignment | 3 |
530 | allow | 3 |
531 | although | 3 |
532 | auto | 3 |
533 | automatically | 3 |
534 | banana | 3 |
535 | below | 3 |
536 | between | 3 |
537 | case | 3 |
538 | chance | 3 |
539 | change | 3 |
540 | choose | 3 |
541 | com | 3 |
542 | commercial | 3 |
543 | complements | 3 |
544 | correcting | 3 |
545 | cosine | 3 |
546 | count | 3 |
547 | creating | 3 |
548 | determine | 3 |
549 | did | 3 |
550 | differs | 3 |
551 | discard | 3 |
552 | down | 3 |
553 | eese | 3 |
554 | effectively | 3 |
555 | efficient | 3 |
556 | encourages | 3 |
557 | entire | 3 |
558 | evaluated | 3 |
559 | extreme | 3 |
560 | failure | 3 |
561 | finetuned | 3 |
562 | format | 3 |
563 | gain | 3 |
564 | general | 3 |
565 | go | 3 |
566 | half | 3 |
567 | hardest | 3 |
568 | hardpq | 3 |
569 | head | 3 |
570 | highly | 3 |
571 | https | 3 |
572 | ideally | 3 |
573 | identical | 3 |
574 | illustration | 3 |
575 | implies | 3 |
576 | improves | 3 |
577 | improving | 3 |
578 | indicating | 3 |
579 | intelligence | 3 |
580 | interestingly | 3 |
581 | intuitively | 3 |
582 | keep | 3 |
583 | least | 3 |
584 | let | 3 |
585 | life | 3 |
586 | lin | 3 |
587 | lu | 3 |
588 | maximum | 3 |
589 | methods | 3 |
590 | mlp | 3 |
591 | nearly | 3 |
592 | non | 3 |
593 | notably | 3 |
594 | observe | 3 |
595 | once | 3 |
596 | overall | 3 |
597 | pattern | 3 |
598 | peft | 3 |
599 | performance | 3 |
600 | performs | 3 |
601 | position | 3 |
602 | practice | 3 |
603 | prediction | 3 |
604 | previous | 3 |
605 | produce | 3 |
606 | prompts | 3 |
607 | qiang | 3 |
608 | randomness | 3 |
609 | rather | 3 |
610 | realize | 3 |
611 | reasoners | 3 |
612 | right | 3 |
613 | rows | 3 |
614 | sample | 3 |
615 | sampling | 3 |
616 | save | 3 |
617 | scheduling | 3 |
618 | sentences | 3 |
619 | set | 3 |
620 | shift | 3 |
621 | significant | 3 |
622 | simplify | 3 |
623 | sinternation | 3 |
624 | solving | 3 |
625 | specifically | 3 |
626 | st | 3 |
627 | stating | 3 |
628 | strikethrough | 3 |
629 | success | 3 |
630 | sufficient | 3 |
631 | supermarket | 3 |
632 | table | 3 |
633 | task | 3 |
634 | temperature | 3 |
635 | too | 3 |
636 | top | 3 |
637 | tried | 3 |
638 | truly | 3 |
639 | try | 3 |
640 | tunes | 3 |
641 | understand | 3 |
642 | unless | 3 |
643 | updates | 3 |
644 | versions | 3 |
645 | want | 3 |
646 | was | 3 |
647 | weizhu | 3 |
648 | will | 3 |
649 | writing | 3 |
650 | xu | 3 |
651 | yang | 3 |
652 | yu | 3 |
653 | zhou | 3 |
654 | above | 2 |
655 | abstract | 2 |
656 | achieving | 2 |
657 | acquire | 2 |
658 | acquired | 2 |
659 | actual | 2 |
660 | adaptation | 2 |
661 | adapted | 2 |
662 | against | 2 |
663 | algorithm | 2 |
664 | allowing | 2 |
665 | alters | 2 |
666 | anna | 2 |
667 | annual | 2 |
668 | appeared | 2 |
669 | appears | 2 |
670 | applied | 2 |
671 | apply | 2 |
672 | approaches | 2 |
673 | architecture | 2 |
674 | article | 2 |
675 | aspect | 2 |
676 | aspects | 2 |
677 | augment | 2 |
678 | auxiliary | 2 |
679 | avoid | 2 |
680 | aware | 2 |
681 | backpacks | 2 |
682 | becomes | 2 |
683 | begin | 2 |
684 | bei | 2 |
685 | ben | 2 |
686 | benefit | 2 |
687 | betas | 2 |
688 | bin | 2 |
689 | broken | 2 |
690 | bubeck | 2 |
691 | calculate | 2 |
692 | capability | 2 |
693 | capacity | 2 |
694 | checking | 2 |
695 | cmu | 2 |
696 | comes | 2 |
697 | comparable | 2 |
698 | complex | 2 |
699 | conclusions | 2 |
700 | conference | 2 |
701 | configurations | 2 |
702 | conjecture | 2 |
703 | consists | 2 |
704 | constructed | 2 |
705 | contained | 2 |
706 | corpus | 2 |
707 | correctly | 2 |
708 | corrects | 2 |
709 | correspond | 2 |
710 | cot | 2 |
711 | crucial | 2 |
712 | current | 2 |
713 | daypacks | 2 |
714 | decoder | 2 |
715 | decoding | 2 |
716 | decreased | 2 |
717 | deferred | 2 |
718 | definethefresh | 2 |
719 | demonstrate | 2 |
720 | denote | 2 |
721 | dependencyunused | 2 |
722 | depends | 2 |
723 | description | 2 |
724 | desirable | 2 |
725 | detected | 2 |
726 | detecting | 2 |
727 | difficulty | 2 |
728 | doesn | 2 |
729 | easier | 2 |
730 | efforts | 2 |
731 | either | 2 |
732 | eldan | 2 |
733 | eliminating | 2 |
734 | encourage | 2 |
735 | english | 2 |
736 | ensuring | 2 |
737 | eric | 2 |
738 | essentially | 2 |
739 | evidence | 2 |
740 | exact | 2 |
741 | examples | 2 |
742 | exploring | 2 |
743 | extremely | 2 |
744 | fei | 2 |
745 | finally | 2 |
746 | focuses | 2 |
747 | follow | 2 |
748 | framework | 2 |
749 | fu | 2 |
750 | gao | 2 |
751 | generalization | 2 |
752 | generates | 2 |
753 | generative | 2 |
754 | give | 2 |
755 | grammar | 2 |
756 | ground | 2 |
757 | gsm | 2 |
758 | guang | 2 |
759 | guide | 2 |
760 | guided | 2 |
761 | he | 2 |
762 | help | 2 |
763 | hierarchical | 2 |
764 | hig | 2 |
765 | hu | 2 |
766 | icecreamasp | 2 |
767 | idea | 2 |
768 | implement | 2 |
769 | implemented | 2 |
770 | incentivized | 2 |
771 | increase | 2 |
772 | increases | 2 |
773 | indicates | 2 |
774 | initial | 2 |
775 | insert | 2 |
776 | integers | 2 |
777 | interested | 2 |
778 | introduced | 2 |
779 | introduction | 2 |
780 | issue | 2 |
781 | iterative | 2 |
782 | jian | 2 |
783 | job | 2 |
784 | jun | 2 |
785 | karl | 2 |
786 | last | 2 |
787 | later | 2 |
788 | layers | 2 |
789 | leading | 2 |
790 | leads | 2 |
791 | learned | 2 |
792 | learns | 2 |
793 | less | 2 |
794 | lets | 2 |
795 | limit | 2 |
796 | llm | 2 |
797 | location | 2 |
798 | logic | 2 |
799 | lou | 2 |
800 | lower | 2 |
801 | lr | 2 |
802 | mann | 2 |
803 | manner | 2 |
804 | maskigsm | 2 |
805 | masks | 2 |
806 | mbzuai | 2 |
807 | meaning | 2 |
808 | means | 2 |
809 | medpq | 2 |
810 | meeting | 2 |
811 | michael | 2 |
812 | miller | 2 |
813 | multinomial | 2 |
814 | name | 2 |
815 | now | 2 |
816 | occurs | 2 |
817 | often | 2 |
818 | open | 2 |
819 | otherwise | 2 |
820 | outperform | 2 |
821 | pan | 2 |
822 | papers | 2 |
823 | paramterabstract | 2 |
824 | particularly | 2 |
825 | performed | 2 |
826 | perhaps | 2 |
827 | positional | 2 |
828 | potentially | 2 |
829 | practical | 2 |
830 | predict | 2 |
831 | predicting | 2 |
832 | prepared | 2 |
833 | promising | 2 |
834 | purposes | 2 |
835 | ramp | 2 |
836 | rarely | 2 |
837 | rates | 2 |
838 | realistic | 2 |
839 | reason | 2 |
840 | reasonable | 2 |
841 | recall | 2 |
842 | recently | 2 |
843 | regressive | 2 |
844 | reliably | 2 |
845 | remains | 2 |
846 | remove | 2 |
847 | require | 2 |
848 | research | 2 |
849 | resp | 2 |
850 | rewrite | 2 |
851 | rict | 2 |
852 | ronen | 2 |
853 | rotary | 2 |
854 | rounds | 2 |
855 | rumored | 2 |
856 | safe | 2 |
857 | say | 2 |
858 | sbananaask | 2 |
859 | sees | 2 |
860 | selecting | 2 |
861 | shifts | 2 |
862 | shizhuo | 2 |
863 | short | 2 |
864 | shot | 2 |
865 | showcase | 2 |
866 | simple | 2 |
867 | simulate | 2 |
868 | single | 2 |
869 | skills | 2 |
870 | slightly | 2 |
871 | smaller | 2 |
872 | solely | 2 |
873 | sometimes | 2 |
874 | sop | 2 |
875 | source | 2 |
876 | soy | 2 |
877 | special | 2 |
878 | stands | 2 |
879 | starting | 2 |
880 | strongly | 2 |
881 | structures | 2 |
882 | studies | 2 |
883 | study | 2 |
884 | summarize | 2 |
885 | support | 2 |
886 | sure | 2 |
887 | teaches | 2 |
888 | team | 2 |
889 | technique | 2 |
890 | tempting | 2 |
891 | tend | 2 |
892 | testing | 2 |
893 | text | 2 |
894 | therefore | 2 |
895 | three | 2 |
896 | throughout | 2 |
897 | tian | 2 |
898 | train | 2 |
899 | transformer | 2 |
900 | truth | 2 |
901 | tuned | 2 |
902 | typically | 2 |
903 | underlined | 2 |
904 | unlike | 2 |
905 | unlikely | 2 |
906 | until | 2 |
907 | url | 2 |
908 | usefulness | 2 |
909 | user | 2 |
910 | usually | 2 |
911 | various | 2 |
912 | verifiers | 2 |
913 | volume | 2 |
914 | well | 2 |
915 | wide | 2 |
916 | widely | 2 |
917 | window | 2 |
918 | words | 2 |
919 | workshop | 2 |
920 | world | 2 |
921 | wu | 2 |
922 | yifei | 2 |
923 | you | 2 |
924 | zeqi | 2 |
925 | zicheng | 2 |
ひさびさにdocker(186)使う
https://qiita.com/kaizen_nagoya/items/e29cbaed8370e7913487
のawkで処理。
PDFからTXTにする際に、複合語が分離できていない課題がある。理由は不明。