LLM(Large Language Model) Calendar 2024
https://qiita.com/advent-calendar/2024/llm
Day 4 投稿予定記事です。
松尾研LLMコミュニティ【Paper & Hacks Vol.26】 医療LLMの研究紹介
https://matsuolab-community.connpass.com/event/336858/
発表者: 松尾研LLMコミュニティメンバー 助田一晟(EQUES CTO / 東大 / 松尾研大規模言語モデル講座 第11回講師)
Development and bilingual evaluation of Japanese medical large language model within reasonably low computational resources
https://arxiv.org/abs/2409.11783
From Medprompt to o1: Exploration of Run-Time Strategies for Medical Challenge Problems and Beyond
https://arxiv.org/pdf/2411.03590v1
2411.03590v1 reference
Rishabh Agarwal, Avi Singh, Lei M Zhang, Bernd Bohnet, Stephanie Chan, Ankesh Anand, Zaheer Abbas, Azade Nova, John D Co-Reyes, Eric Chu, et al. Many-shot in-context learning. arXiv preprint arXiv:2404.11018, 2024.
S ́ebastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
Niels J Blunch. Position bias in multiple-choice questions. Journal of Marketing Research, 21(2):216–220, 1984.
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners, 2020.
Elliot Bolton, Abhinav Venigalla, Michihiro Yasunaga, David Hall, Betty Xiong, Tony Lee, Roxana Daneshjou, Jonathan Frankle, Percy Liang, Michael Carbin, et al. Biomedlm: A 2.7 b parameter language model trained on biomedical text. arXiv preprint arXiv:2403.18421, 2024.
[CNMCK04] Rich Caruana, Alexandru Niculescu-Mizil, Geoff Crew, and Alex Ksikes. Ensemble selec- tion from libraries of models. In Proceedings of the twenty-first international conference on Machine learning, page 18, 2004.
Samuel J Gershman, Eric J Horvitz, and Joshua B Tenenbaum. Computational rationality: A converging paradigm for intelligence in brains, minds, and machines. Science, 349(6245), 2015.
Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans. Comput. Healthcare, 3(1), oct 2021.
Eric Horvitz and John Breese. Ideal partition of resources for metareasoning. arXiv preprint arXiv:2110.09624, 2021.
Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
Eric J. Horvitz, Gregory F. Cooper, and David E. Heckerman. Reflection and action under scarce resources: Theoretical principles and empirical study. In Proceedings of the 11th International Joint Conference on Artificial Intelligence - Volume 2, IJCAI’89, page 1121–1127, San Francisco, CA, USA, 1989. Morgan Kaufmann Publishers Inc.
Thomas Hartvigsen, Saadia Gabriel, Hamid Palangi, Maarten Sap, Dipankar Ray, and Ece Kamar. Toxigen: A large-scale machine-generated dataset for adversarial and implicit hate speech detection. arXiv preprint arXiv:2203.09509, 2022.
Eric J. Horvitz. Reasoning about beliefs and actions under computational resource con- straints. In Proceedings of the Third Conference on Uncertainty in Artificial Intelligence, UAI’87, page 429–447. AUAI Press, 1987.
Eric Horvitz. Reasoning under varying and uncertain resource constraints. In AAAI, volume 88, pages 111–116. Citeseer, 1988.
Eric Horvitz. Principles and applications of continual computation. Artificial Intelligence, 126(1-2):159–196, 2001.
Eric Horvitz, Yongshao Ruan, Carla Gomes, Henry Kautz, Bart Selman, and Max Chick- ering. A Bayesian approach to tackling hard computational problems. In Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence, pages 235–244, July 2001.
Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. What disease does this patient have? a large-scale open domain question answering dataset from medical exams. Applied Sciences, 11(14):6421, 2021.
Dongfu Jiang, Xiang Ren, and Bill Yuchen Lin. Llm-blender: Ensembling large language models with pairwise ranking and generative fusion. arXiv preprint arXiv:2306.02561, 2023.
Carlos E Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770, 2023.
Tiffany H Kung, Morgan Cheatham, Arielle Medenilla, Czarina Sillos, Lorie De Leon, Camille Elepan ̃o, Maria Madriaga, Rimel Aggabao, Giezel Diaz-Candido, James Maningo, et al. Performance of chatgpt on usmle: potential for ai-assisted medical education using large language models. PLoS digital health, 2(2):e0000198, 2023.
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. Large language models are zero-shot reasoners. Advances in neural information processing systems, 35:22199–22213, 2022.
Jungo Kasai, Yuhei Kasai, Keisuke Sakaguchi, Yutaro Yamada, and Dragomir Radev. Evaluating gpt-4 and chatgpt on japanese medical licensing examinations. arXiv preprint arXiv:2303.18027, 2023.
Miyoung Ko, Jinhyuk Lee, Hyunjae Kim, Gangwoo Kim, and Jaewoo Kang. Look at the first sentence: Position bias in question answering. arXiv preprint arXiv:2004.14602, 2020.
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
19
[LNRT24] [LPP+ 20]
[LSX+ 22]
[NKM+ 23]
[NLZ+ 23a] [NLZ+23b]
[Ope23] [Ope24a]
[Ope24b] [Ope24c] [Ope24d] [PMB24] [PUS22]
[PZWG23]
Scott Lundberg, Harsha Nori, Marco Tulio Ribeiro, and Guidance AI Team. Guidance: A guidance language for controlling large language models, 2024. GitHub repository.
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Na- man Goyal, Heinrich Ku ̈ttler, Mike Lewis, Wen-tau Yih, Tim Rockt ̈aschel, et al. Retrieval- augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020.
Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie- Yan Liu. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics, 23(6), 09 2022. bbac409.
Harsha Nori, Nicholas King, Scott Mayer McKinney, Dean Carignan, and Eric Horvitz. Capabilities of GPT-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023.
Xuefei Ning, Zinan Lin, Zixuan Zhou, Zifu Wang, Huazhong Yang, and Yu Wang. Skeleton- of-thought: Large language models can do parallel decoding. Proceedings ENLSP-III, 2023.
Harsha Nori, Yin Tat Lee, Sheng Zhang, Dean Carignan, Richard Edgar, Nicolo Fusi, Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, Renqian Luo, Scott Mayer McKinney, Robert Osazuwa Ness, Hoifung Poon, Tao Qin, Naoto Usuyama, Chris White, and Eric Horvitz. Can generalist foundation models outcompete special-purpose tuning? Case study in medicine. arXiv preprint arXiv:2311.16452, 2023.
OpenAI. Gpt-4 technical report, 2023.
OpenAI. Advice on prompting - reasoning, coding, and planning exam- ples. https://platform.openai.com/docs/guides/reasoning/advice-on-prompting? reasoning-prompt-examples=coding-planning, 2024. Accessed: 2024-10-20.
OpenAI. Learning to reason with large language models. https://openai.com/index/ learning-to-reason-with-llms/, 2024. Accessed: 2024-10-20.
OpenAI. Openai api guide - reasoning. https://platform.openai.com/docs/guides/ reasoning, September 2024. Accessed: 2024-10-20.
OpenAI. Openai o1 system card. https://cdn.openai.com/o1-system-card-20240917. pdf, September 2024. Accessed: 2024-10-20.
Jacob Pfau, William Merrill, and Samuel R Bowman. Let’s think dot by dot: Hidden computation in transformer language models. arXiv preprint arXiv:2404.15758, 2024.
Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. Medmcqa: A large- scale multi-subject multi-choice dataset for medical domain question answering. In Con- ference on Health, Inference, and Learning, pages 248–260. PMLR, 2022.
Shishir G Patil, Tianjun Zhang, Xin Wang, and Joseph E Gonzalez. Gorilla: Large lan- guage model connected with massive apis. arXiv preprint arXiv:2305.15334, 2023.
Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. arXiv preprint arXiv:2212.13138, 2022.
Timo Schick, Jane Dwivedi-Yu, Roberto Dess`ı, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. Advances in Neural Information Processing Systems, 36, 2023.
Charlie Snell, Jaehoon Lee, Kelvin Xu, and Aviral Kumar. Scaling llm test-time com- pute optimally can be more effective than scaling model parameters. arXiv preprint arXiv:2408.03314, 2024.
Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal, et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617, 2023.
Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, et al. Capabilities of gemini models in medicine. arXiv preprint arXiv:2404.18416, 2024.
Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Shaokun Zhang, Erkang Zhu, Beibin Li, Li Jiang, Xiaoyun Zhang, and Chi Wang. Autogen: Enabling next-gen llm applications via multi-agent conversation framework. arXiv preprint arXiv:2308.08155, 2023.
Xinglin Wang, Shaoxiong Feng, Yiwei Li, Peiwen Yuan, Yueqi Zhang, Boyuan Pan, Heda Wang, Yao Hu, and Kan Li. Make every penny count: Difficulty-adaptive self-consistency for cost-efficient reasoning. arXiv preprint arXiv:2408.13457, 2024.
Brandon T Willard and R ́emi Louf. Efficient guided generation for llms. arXiv preprint arXiv:2307.09702, 2023.
Tianhao Wu, Janice Lan, Weizhe Yuan, Jiantao Jiao, Jason Weston, and Sainbayar Sukhbaatar. Thinking llms: General instruction following with thought generation. arXiv preprint arXiv:2410.10630, 2024.
Yangzhen Wu, Zhiqing Sun, Shanda Li, Sean Welleck, and Yiming Yang. An empirical analysis of compute-optimal inference for problem-solving with language models. arXiv preprint arXiv:2408.00724, 2024.
Guangya Wan, Yuqi Wu, Jie Chen, and Sheng Li. Dynamic self-consistency: Leveraging reasoning paths for efficient llm sampling. arXiv preprint arXiv:2408.17017, 2024.
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in neural information processing systems, 35:24824–24837, 2022.
xjdr alt. Entropix: A fast entropy-based data pruning tool, 2023. Accessed: 2024-10-21.
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. Advances in Neural Information Processing Systems, 36, 2024.
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. React: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629, 2022.
Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric Xing, et al. Judging llm-as-a-judge with mt- bench and chatbot arena. Advances in Neural Information Processing Systems, 36:46595– 46623, 2023.
Eric Zelikman, Georges Harik, Yijia Shao, Varuna Jayasiri, Nick Haber, and Noah D Goodman. Quiet-star: Language models can teach themselves to think before speaking. arXiv preprint arXiv:2403.09629, 2024.
Eric Zelikman, Yuhuai Wu, Jesse Mu, and Noah Goodman. Star: Bootstrapping reasoning with reasoning. Advances in Neural Information Processing Systems, 35:15476–15488, 2022.
2409.11783 References
[1] Qwen2 technical report. 2024.
[2] Takuya Akiba, Makoto Shing, Yujin Tang, Qi Sun, and David Ha. Evolutionary Optimization
of Model Merging Recipes. arXiv preprint arXiv:2403.13187v1, 2024.
[3]Andreas Geert Motzfeldt Aryo Pradipta Gema Ankit Pal, Pasquale Min- ervini and Beatrice Alex. openlifescienceaiopen_medical_llm_leaderboard. https://huggingface.co/spaces/openlifescienceai/open_medical_llm_leaderboard, 2024.
[4] Kaj Bostrom and Greg Durrett. Byte pair encoding is suboptimal for language model pretrain- ing. arXiv preprint arXiv:2004.03720, 2020.
[5] Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhari- wal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language mod- els are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
[6] Zeming Chen, Alejandro Hernández-Cano, Angelika Romanou, Antoine Bonnet, Kyle Ma- toba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, and Antoine Bosselut. Meditron-70b: Scaling medical pretraining for large language models, 2023.
[7] Daixuan Cheng, Shaohan Huang, and Furu Wei. Adapting large language models via reading comprehension. In The Twelfth International Conference on Learning Representations, 2024.
[8] ClémentChristophe,TathagataRaha,NasirHayat,PraveenKanithi,AhmedAl-Mahrooqi,Pra- teek Munjal, Nada Saadi, Hamza Javed, Umar Salman, Svetlana Maslenkova, Marco Pimentel, Ronnie Rajan, and Shadab Khan. Med42-v2 - a suite of clinically-aligned large language models. 2024.
[9] ClémentChristophe,PraveenKKanithi,PrateekMunjal,TathagataRaha,NasirHayat,Ronnie Rajan, Ahmed Al-Mahrooqi, Avani Gupta, Muhammad Umar Salman, Gurpreet Gosal, Bhar- gav Kanakiya, Charles Chen, Natalia Vassilieva, Boulbaba Ben Amor, Marco AF Pimentel, and Shadab Khan. Med42 – evaluating fine-tuning strategies for medical llms: Full-parameter vs. parameter-efficient approaches. 2024.
[10] Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient fine- tuning of quantized llms. arXiv e-prints, pages arXiv–2305, 2023.
[11] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
[12] Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, An- thony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Roziere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chun- yang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Frank Zhang, Gabriel Syn- naeve, Gabrielle Lee, Georgia Lewis Anderson, Graeme Nail, Gregoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, Iliyan Zarov, Imanol Arrieta Ibarra, Isabel Kloumann, Ishan Misra, Ivan Evtimov, Jade Copet, Jaewon Lee, Jan Geffert, Jana Vranes, Jason Park, Jay Mahadeokar, Jeet Shah, Jelmer van der Linde, Jen- nifer Billock, Jenny Hong, Jenya Lee, Jeremy Fu, Jianfeng Chi, Jianyu Huang, Jiawen Liu, Jie Wang, Jiecao Yu, Joanna Bitton, Joe Spisak, Jongsoo Park, Joseph Rocca, Joshua John- stun, Joshua Saxe, Junteng Jia, Kalyan Vasuden Alwala, Kartikeya Upasani, Kate Plawiak, Ke Li, Kenneth Heafield, Kevin Stone, Khalid El-Arini, Krithika Iyer, Kshitiz Malik, Kuen- ley Chiu, Kunal Bhalla, Lauren Rantala-Yeary, Laurens van der Maaten, Lawrence Chen, Liang Tan, Liz Jenkins, Louis Martin, Lovish Madaan, Lubo Malo, Lukas Blecher, Lukas Landzaat, Luke de Oliveira, Madeline Muzzi, Mahesh Pasupuleti, Mannat Singh, Manohar Paluri, Marcin Kardas, Mathew Oldham, Mathieu Rita, Maya Pavlova, Melanie Kambadur, Mike Lewis, Min Si, Mitesh Kumar Singh, Mona Hassan, Naman Goyal, Narjes Torabi, Niko- lay Bashlykov, Nikolay Bogoychev, Niladri Chatterji, Olivier Duchenne, Onur Çelebi, Patrick Alrassy, Pengchuan Zhang, Pengwei Li, Petar Vasic, Peter Weng, Prajjwal Bhargava, Pratik Dubal, Praveen Krishnan, Punit Singh Koura, Puxin Xu, Qing He, Qingxiao Dong, Ragavan Srinivasan, Raj Ganapathy, Ramon Calderer, Ricardo Silveira Cabral, Robert Stojnic, Roberta Raileanu, Rohit Girdhar, Rohit Patel, Romain Sauvestre, Ronnie Polidoro, Roshan Sumbaly, Ross Taylor, Ruan Silva, Rui Hou, Rui Wang, Saghar Hosseini, Sahana Chennabasappa, San- jay Singh, Sean Bell, Seohyun Sonia Kim, Sergey Edunov, Shaoliang Nie, Sharan Narang, Sharath Raparthy, Sheng Shen, Shengye Wan, Shruti Bhosale, Shun Zhang, Simon Vanden- hende, Soumya Batra, Spencer Whitman, Sten Sootla, Stephane Collot, Suchin Gururangan, Sydney Borodinsky, Tamar Herman, Tara Fowler, Tarek Sheasha, Thomas Georgiou, Thomas Scialom, Tobias Speckbacher, Todor Mihaylov, Tong Xiao, Ujjwal Karn, Vedanuj Goswami, Vibhor Gupta, Vignesh Ramanathan, Viktor Kerkez, Vincent Gonguet, Virginie Do, Vish Vo- geti, Vladan Petrovic, Weiwei Chu, Wenhan Xiong, Wenyin Fu, Whitney Meers, Xavier Mar- tinet, Xiaodong Wang, Xiaoqing Ellen Tan, Xinfeng Xie, Xuchao Jia, Xuewei Wang, Yaelle Goldschlag, Yashesh Gaur, Yasmine Babaei, Yi Wen, Yiwen Song, Yuchen Zhang, Yue Li, Yun- ing Mao, Zacharie Delpierre Coudert, Zheng Yan, Zhengxing Chen, Zoe Papakipos, Aaditya Singh, Aaron Grattafiori, Abha Jain, Adam Kelsey, Adam Shajnfeld, Adithya Gangidi, Adolfo Victoria, Ahuva Goldstand, Ajay Menon, Ajay Sharma, Alex Boesenberg, Alex Vaughan, Alexei Baevski, Allie Feinstein, Amanda Kallet, Amit Sangani, Anam Yunus, Andrei Lupu, Andres Alvarado, Andrew Caples, Andrew Gu, Andrew Ho, Andrew Poulton, Andrew Ryan, Ankit Ramchandani, Annie Franco, Aparajita Saraf, Arkabandhu Chowdhury, Ashley Gabriel, Ashwin Bharambe, Assaf Eisenman, Azadeh Yazdan, Beau James, Ben Maurer, Benjamin Leonhardi, Bernie Huang, Beth Loyd, Beto De Paola, Bhargavi Paranjape, Bing Liu, Bo Wu, Boyu Ni, Braden Hancock, Bram Wasti, Brandon Spence, Brani Stojkovic, Brian Gamido, Britt Montalvo, Carl Parker, Carly Burton, Catalina Mejia, Changhan Wang, Changkyu Kim, Chao Zhou, Chester Hu, Ching-Hsiang Chu, Chris Cai, Chris Tindal, Christoph Feichten- hofer, Damon Civin, Dana Beaty, Daniel Kreymer, Daniel Li, Danny Wyatt, David Adkins, David Xu, Davide Testuggine, Delia David, Devi Parikh, Diana Liskovich, Didem Foss, Dingkang Wang, Duc Le, Dustin Holland, Edward Dowling, Eissa Jamil, Elaine Montgomery, Eleonora Presani, Emily Hahn, Emily Wood, Erik Brinkman, Esteban Arcaute, Evan Dun- bar, Evan Smothers, Fei Sun, Felix Kreuk, Feng Tian, Firat Ozgenel, Francesco Caggioni, Francisco Guzmán, Frank Kanayet, Frank Seide, Gabriela Medina Florez, Gabriella Schwarz, Gada Badeer, Georgia Swee, Gil Halpern, Govind Thattai, Grant Herman, Grigory Sizov, Guangyi, Zhang, Guna Lakshminarayanan, Hamid Shojanazeri, Han Zou, Hannah Wang, Han- wen Zha, Haroun Habeeb, Harrison Rudolph, Helen Suk, Henry Aspegren, Hunter Goldman, Igor Molybog, Igor Tufanov, Irina-Elena Veliche, Itai Gat, Jake Weissman, James Geboski, James Kohli, Japhet Asher, Jean-Baptiste Gaya, Jeff Marcus, Jeff Tang, Jennifer Chan, Jenny Zhen, Jeremy Reizenstein, Jeremy Teboul, Jessica Zhong, Jian Jin, Jingyi Yang, Joe Cum- mings, Jon Carvill, Jon Shepard, Jonathan McPhie, Jonathan Torres, Josh Ginsburg, Junjie Wang, Kai Wu, Kam Hou U, Karan Saxena, Karthik Prasad, Kartikay Khandelwal, Katayoun Zand, Kathy Matosich, Kaushik Veeraraghavan, Kelly Michelena, Keqian Li, Kun Huang, Ku- nal Chawla, Kushal Lakhotia, Kyle Huang, Lailin Chen, Lakshya Garg, Lavender A, Leandro Silva, Lee Bell, Lei Zhang, Liangpeng Guo, Licheng Yu, Liron Moshkovich, Luca Wehrst- edt, Madian Khabsa, Manav Avalani, Manish Bhatt, Maria Tsimpoukelli, Martynas Mankus, Matan Hasson, Matthew Lennie, Matthias Reso, Maxim Groshev, Maxim Naumov, Maya Lathi, Meghan Keneally, Michael L. Seltzer, Michal Valko, Michelle Restrepo, Mihir Patel, Mik Vyatskov, Mikayel Samvelyan, Mike Clark, Mike Macey, Mike Wang, Miquel Jubert Her- moso, Mo Metanat, Mohammad Rastegari, Munish Bansal, Nandhini Santhanam, Natascha Parks, Natasha White, Navyata Bawa, Nayan Singhal, Nick Egebo, Nicolas Usunier, Niko- lay Pavlovich Laptev, Ning Dong, Ning Zhang, Norman Cheng, Oleg Chernoguz, Olivia Hart, Omkar Salpekar, Ozlem Kalinli, Parkin Kent, Parth Parekh, Paul Saab, Pavan Balaji, Pedro Rittner, Philip Bontrager, Pierre Roux, Piotr Dollar, Polina Zvyagina, Prashant Ratanchan- dani, Pritish Yuvraj, Qian Liang, Rachad Alao, Rachel Rodriguez, Rafi Ayub, Raghotham Murthy, Raghu Nayani, Rahul Mitra, Raymond Li, Rebekkah Hogan, Robin Battey, Rocky Wang, Rohan Maheswari, Russ Howes, Ruty Rinott, Sai Jayesh Bondu, Samyak Datta, Sara Chugh, Sara Hunt, Sargun Dhillon, Sasha Sidorov, Satadru Pan, Saurabh Verma, Seiji Ya- mamoto, Sharadh Ramaswamy, Shaun Lindsay, Shaun Lindsay, Sheng Feng, Shenghao Lin, Shengxin Cindy Zha, Shiva Shankar, Shuqiang Zhang, Shuqiang Zhang, Sinong Wang, Sneha Agarwal, Soji Sajuyigbe, Soumith Chintala, Stephanie Max, Stephen Chen, Steve Kehoe, Steve Satterfield, Sudarshan Govindaprasad, Sumit Gupta, Sungmin Cho, Sunny Virk, Suraj Sub- ramanian, Sy Choudhury, Sydney Goldman, Tal Remez, Tamar Glaser, Tamara Best, Thilo Kohler, Thomas Robinson, Tianhe Li, Tianjun Zhang, Tim Matthews, Timothy Chou, Tzook Shaked, Varun Vontimitta, Victoria Ajayi, Victoria Montanez, Vijai Mohan, Vinay Satish Ku- mar, Vishal Mangla, Vlad Ionescu, Vlad Poenaru, Vlad Tiberiu Mihailescu, Vladimir Ivanov, Wei Li, Wenchen Wang, Wenwen Jiang, Wes Bouaziz, Will Constable, Xiaocheng Tang, Xi- aofang Wang, Xiaojian Wu, Xiaolan Wang, Xide Xia, Xilun Wu, Xinbo Gao, Yanjun Chen, Ye Hu, Ye Jia, Ye Qi, Yenda Li, Yilin Zhang, Ying Zhang, Yossi Adi, Youngjin Nam, Yu, Wang, Yuchen Hao, Yundi Qian, Yuzi He, Zach Rait, Zachary DeVito, Zef Rosnbrick, Zhao- duo Wen, Zhenyu Yang, and Zhiwei Zhao. The Llama 3 Herd of Models, 2024.
[13] Clémentine Fourrier, Nathan Habib, Alina Lozovskaya, Kon- rad Szafer, and Thomas Wolf. Open llm leaderboard v2. https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard, 2024.
[14] Kazuki Fujii, Taishi Nakamura, Mengsay Loem, Hiroki Iida, Masanari Ohi, Kakeru Hattori, Hirai Shota, Sakae Mizuki, Rio Yokota, and Naoaki Okazaki. Continual Pre-Training for Cross-Lingual LLM Adaptation: Enhancing Japanese Language Capabilities. arXiv preprint arXiv:2404.17790, 2024.
[15] Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
[16] Masato Hirakawa, Shintaro Horie, Tomoaki Nakamura, Daisuke Oba, Sam Passaglia, and Akira Sasaki. elyza/llama-3-elyza-jp-8b, 2024.
[17] Edward J Hu, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. LoRA: Low-Rank Adaptation of Large Language Models. In International Con- ference on Learning Representations, 2021.
[18] Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023.
[19] AlbertQJiang,AlexandreSablayrolles,AntoineRoux,ArthurMensch,BlancheSavary,Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024.
[20] DiJin,EileenPan,NassimOufattole,Wei-HungWeng,HanyiFang,andPeterSzolovits.What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams. arXiv preprint arXiv:2009.13081, 2020.
[21] Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William Cohen, and Xinghua Lu. PubMedQA: A Dataset for Biomedical Research Question Answering. In Proceedings of the 2019 Confer- ence on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 2567–2577, 2019.
[22] JaredKaplan,SamMcCandlish,TomHenighan,TomBBrown,BenjaminChess,RewonChild, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. arXiv preprint arXiv:2001.08361, 2020.
[23] Jungo Kasai, Yuhei Kasai, Keisuke Sakaguchi, Yutaro Yamada, and Dragomir Radev. Eval- uating GPT-4 and ChatGPT on Japanese Medical Licensing Examinations. arXiv preprint arXiv:2303.18027, 2023.
[24] Yoshimasa Kawazoe, Daisaku Shibata, Emiko Shinohara, Eiji Aramaki, and Kazuhiko Ohe. A clinical specific bert developed using a huge japanese clinical text corpus. Plos one, 16(11):e0259763, 2021.
[25] Anastasia Krithara, Anastasios Nentidis, Konstantinos Bougiatiotis, and Georgios Paliouras. BioASQ-QA: A manually curated corpus for Biomedical Question Answering. Scientific Data, 10(1):170, 2023.
[26] Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. BioMistral: A Collection of Open-Source Pretrained Large Language Models for Medical Domains. arXiv preprint arXiv:2402.10373, 2024.
[27] Swallow LLM. Llama 3 Swallow, 2024.
[28] Robert Osazuwa Ness, Katie Matton, Hayden Helm, Sheng Zhang, Junaid Bajwa, Carey E Priebe, and Eric Horvitz. Medfuzz: Exploring the robustness of large language models in medical question answering. arXiv preprint arXiv:2406.06573, 2024.
[29] Harsha Nori, Nicholas King, Scott Mayer McKinney, Dean Carignan, and Eric Horvitz. Capa- bilities of GPT-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023.
[30] HarshaNori,YinTatLee,ShengZhang,DeanCarignan,RichardEdgar,NicoloFusi,Nicholas King, Jonathan Larson, Yuanzhi Li, Weishung Liu, et al. Can generalist foundation models outcompete special-purpose tuning? case study in medicine. arXiv preprint arXiv:2311.16452, 2023.
[31] OpenAI. ChatML. https://github.com/openai/openai-python/blob/release-v0.28.0/chatml.md, 2023. Accessed 2024-08-09.
[32] Ankit Pal and Malaikannan Sankarasubbu. OpenBioLLMs: Advancing Open-Source Large Language Models for Healthcare and Life Sciences. https://huggingface.co/aaditya/OpenBioLLM-Llama3-70B, 2024.
[33] Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. MedMCQA: A Large- scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. In Ger- ardo Flores, George H Chen, Tom Pollard, Joyce C Ho, and Tristan Naumann, editors, Pro- ceedings of the Conference on Health, Inference, and Learning, volume 174 of Proceedings of Machine Learning Research, pages 248–260. PMLR, 07–08 Apr 2022.
[34] Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, and Hisham Cholakkal. BiMediX: Bilingual Medical Mixture of Experts LLM. arXiv preprint arXiv:2402.13253, 2024.
[35] Pengcheng Qiu, Chaoyi Wu, Xiaoman Zhang, Weixiong Lin, Haicheng Wang, Ya Zhang, Yan- feng Wang, and Weidi Xie. Towards building multilingual language model for medicine. arXiv preprint arXiv:2402.13963, 2024.
[36] Khaled Saab, Tao Tu, Wei-Hung Weng, Ryutaro Tanno, David Stutz, Ellery Wulczyn, Fan Zhang, Tim Strother, Chunjong Park, Elahe Vedadi, Juanma Zambrano Chaves, Szu-Yeu Hu, Mike Schaekermann, Aishwarya Kamath, Yong Cheng, David G. T. Barrett, Cathy Cheung, Basil Mustafa, Anil Palepu, Daniel McDuff, Le Hou, Tomer Golany, Luyang Liu, Jean bap- tiste Alayrac, Neil Houlsby, Nenad Tomasev, Jan Freyberg, Charles Lau, Jonas Kemp, Jeremy Lai, Shekoofeh Azizi, Kimberly Kanada, SiWai Man, Kavita Kulkarni, Ruoxi Sun, Siamak Shakeri, Luheng He, Ben Caine, Albert Webson, Natasha Latysheva, Melvin Johnson, Philip Mansfield, Jian Lu, Ehud Rivlin, Jesper Anderson, Bradley Green, Renee Wong, Jonathan Krause, Jonathon Shlens, Ewa Dominowska, S. M. Ali Eslami, Katherine Chou, Claire Cui, Oriol Vinyals, Koray Kavukcuoglu, James Manyika, Jeff Dean, Demis Hassabis, Yossi Ma- tias, Dale Webster, Joelle Barral, Greg Corrado, Christopher Semturs, S. Sara Mahdavi, Juraj Gottweis, Alan Karthikesalingam, and Vivek Natarajan. Capabilities of Gemini Models in Medicine. arXiv preprint arXiv:2404.18416, 2024.
[37] Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, and Koh Mitsuda. Release of pre-trained models for the Japanese language. In Pro- ceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 5 2024.
[38] Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. Large language models encode clinical knowledge. Nature, pages 1–9, 2023.
[39] Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal, et al. Towards expert-level medical question answering with large language models. arXiv preprint arXiv:2305.09617, 2023.
[40] Kaito Sugimoto, Taichi Iki, Yuki Chida, Teruhito Kanazawa, and Akiko Aizawa. Jmedroberta: a japanese pre-trained language model on academic articles in medical sciences. In Proceed- ings of the 29th Annual Meeting of the Association for Natural Language Processing, 2023.
[41] Issey Sukeda, Risa Kishikawa, and Satoshi Kodera. 70B-parameter large language models in Japanese medical question-answering. arXiv preprint arXiv:2406.14882, 2024.
[42] Issey Sukeda, Masahiro Suzuki, Hiroki Sakaji, and Satoshi Kodera. JMedLoRA: Medical Domain Adaptation on Japanese Large Language Models using Instruction-tuning. In Deep Generative Models for Health Workshop NeurIPS 2023, 2023.
[43] Issey Sukeda, Masahiro Suzuki, Hiroki Sakaji, and Satoshi Kodera. Development and analysis of medical instruction-tuning for Japanese large language models. Artificial Intelligence in Health, 1(2):107–116, 2024.
[44] Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford Alpaca: An Instruction-following LLaMA model. https://github.com/tatsu-lab/stanford_alpaca, 2023.
[45] The ModelScope Team. SWIFT:Scalable lightWeight Infrastructure for Fine-Tuning. https://github.com/modelscope/swift, 2024.
[46] Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971, 2023.
[47] Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023.
[48] Tao Tu, Shekoofeh Azizi, Danny Driess, Mike Schaekermann, Mohamed Amin, Pi-Chuan Chang, Andrew Carroll, Charles Lau, Ryutaro Tanno, Ira Ktena, et al. Towards generalist biomedical ai. NEJM AI, 1(3):AIoa2300138, 2024.
14
[49] Changhan Wang, Kyunghyun Cho, and Jiatao Gu. Neural machine translation with byte-level subwords. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pages 9154–9160, 2020.
[50] Junlin Wang, Jue Wang, Ben Athiwaratkun, Ce Zhang, and James Zou. Mixture-of-Agents Enhances Large Language Model Capabilities. arXiv preprint arXiv:2406.04692, 2024.
[51] Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, et al. Cmb: A comprehensive medical benchmark in chinese. arXiv preprint arXiv:2308.08833, 2023.
[52] XuezhiWang,JasonWei,DaleSchuurmans,QuocVLe,EdHChi,SharanNarang,Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in lan- guage models. In The Eleventh International Conference on Learning Representations, 2022.
[53] Jason Wei, Maarten Bosma, Vincent Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M Dai, and Quoc V Le. Finetuned language models are zero-shot learners. In International Conference on Learning Representations, 2022.
[54] Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Fei Xia, Ed Chi, Quoc V Le, Denny Zhou, et al. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems, 35:24824–24837, 2022.
[55] Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Huggingface’s transform- ers: State-of-the-art natural language processing. arXiv preprint arXiv:1910.03771, 2019.
[56] Guangzhi Xiong, Qiao Jin, Zhiyong Lu, and Aidong Zhang. Benchmarking retrieval- augmented generation for medicine. arXiv preprint arXiv:2402.13178, 2024.
[57] Xwin-LM Team. Xwin-LM, 9 2023.
[58] Ziqi Yin, Hao Wang, Kaito Horio, Daisuke Kawahara, and Satoshi Sekine. Should We Respect LLMs? A Cross-Lingual Study on the Influence of Prompt Politeness on LLM Performance. arXiv preprint arXiv:2402.14531v1, 2022.
2411.03590v1 word count
term | count |
---|---|
445 | |
the | 222 |
and | 184 |
of | 160 |
in | 128 |
to | 121 |
for | 94 |
models | 89 |
on | 85 |
a | 84 |
o | 81 |
with | 72 |
model | 70 |
reasoning | 70 |
69 | |
performance | 63 |
arxiv | 56 |
preview | 56 |
gpt | 53 |
that | 51 |
medical | 49 |
as | 44 |
is | 44 |
prompting | 42 |
we | 41 |
medprompt | 39 |
by | 35 |
language | 35 |
shot | 34 |
can | 33 |
this | 31 |
time | 31 |
more | 29 |
preprint | 28 |
strategies | 28 |
tokens | 26 |
are | 25 |
benchmarks | 25 |
from | 25 |
results | 25 |
run | 25 |
these | 25 |
accuracy | 24 |
few | 24 |
llms | 24 |
b | 23 |
ensembling | 23 |
multiple | 22 |
cost | 21 |
inference | 21 |
prompt | 21 |
such | 21 |
across | 20 |
at | 20 |
capabilities | 20 |
or | 20 |
large | 19 |
o | 19 |
of | 19 |
learning | 18 |
medqa | 18 |
eric | 17 |
further | 17 |
like | 17 |
ope | 17 |
our | 16 |
techniques | 16 |
using | 16 |
be | 15 |
chain | 15 |
figure | 15 |
information | 15 |
questions | 15 |
research | 15 |
tasks | 15 |
an | 14 |
examples | 14 |
has | 14 |
new | 14 |
nlz | 14 |
question | 14 |
step | 14 |
which | 14 |
have | 12 |
it | 12 |
knowledge | 12 |
methods | 12 |
question | 12 |
resources | 12 |
their | 12 |
use | 12 |
窶 | 12 |
a | 11 |
challenge | 11 |
context | 11 |
dataset | 11 |
each | 11 |
horvitz | 11 |
its | 11 |
self | 11 |
steering | 11 |
thought | 11 |
was | 11 |
when | 11 |
while | 11 |
al | 10 |
answerchoices | 10 |
approach | 10 |
benchmark | 10 |
computational | 10 |
et | 10 |
exam | 10 |
generation | 10 |
leveraging | 10 |
li | 10 |
llm | 10 |
multi | 10 |
openai | 10 |
problem | 10 |
problems | 10 |
shot | 10 |
specialized | 10 |
training | 10 |
zhang | 10 |
advanced | 9 |
also | 9 |
area | 9 |
domain | 9 |
final | 9 |
not | 9 |
processing | 9 |
sampling | 9 |
study | 9 |
systems | 9 |
token | 9 |
tools | 9 |
without | 9 |
wu | 9 |
advances | 8 |
api | 8 |
choice | 8 |
context | 8 |
datasets | 8 |
dynamic | 8 |
effective | 8 |
enhance | 8 |
even | 8 |
external | 8 |
general | 8 |
generating | 8 |
improve | 8 |
into | 8 |
med | 8 |
neural | 8 |
output | 8 |
outputs | 8 |
real | 8 |
remains | 8 |
set | 8 |
technique | 8 |
the | 8 |
wang | 8 |
8 |
2409.11783v2 word count
term | count |
---|---|
the | 289 |
196 | |
in | 185 |
and | 154 |
b | 144 |
of | 140 |
medical | 115 |
to | 112 |
is | 100 |
a | 96 |
llama | 88 |
for | 76 |
models | 72 |
japanese | 65 |
on | 65 |
we | 59 |
model | 56 |
arxiv | 52 |
as | 52 |
llms | 51 |
48 | |
by | 46 |
language | 45 |
are | 43 |
our | 43 |
english | 42 |
with | 42 |
v | 41 |
that | 39 |
ja | 37 |
llm | 35 |
training | 34 |
from | 32 |
tuning | 31 |
performance | 30 |
this | 30 |
which | 29 |
be | 28 |
dataset | 28 |
not | 28 |
en | 26 |
preprint | 26 |
base | 25 |
evaluation | 25 |
fine | 25 |
parameter | 25 |
benchmarks | 24 |
it | 24 |
the | 24 |
qwen | 23 |
table | 23 |
wang | 23 |
accuracy | 22 |
al | 22 |
data | 22 |
large | 22 |
open | 22 |
竏 | 22 |
igakuqa | 21 |
jmedllm | 21 |
llama | 21 |
zhang | 21 |
prompt | 20 |
shot | 20 |
can | 19 |
medswallow | 19 |
other | 19 |
swallow | 19 |
et | 18 |
https | 18 |
clinical | 17 |
have | 17 |
in | 17 |
medmcqa | 17 |
mfpt | 17 |
mpeft | 17 |
or | 17 |
preferred | 17 |
question | 17 |
source | 17 |
based | 16 |
benchmark | 16 |
each | 16 |
instruction | 16 |
medqa | 16 |
however | 15 |
huggingface | 15 |
instruct | 15 |
samples | 15 |
also | 14 |
centric | 14 |
co | 14 |
experiments | 14 |
input | 14 |
med | 14 |
only | 14 |
ours | 14 |
scores | 14 |
while | 14 |
answering | 13 |
li | 13 |
resources | 13 |
usmle | 13 |
was | 13 |
and | 12 |
been | 12 |
choice | 12 |
domain | 12 |
inference | 12 |
openbiollm | 12 |
their | 12 |
these | 12 |
using | 12 |
an | 11 |
computational | 11 |
few | 11 |
five | 11 |
following | 11 |
has | 11 |
of | 11 |
significant | 11 |
to | 11 |
use | 11 |
used | 11 |
at | 10 |
both | 10 |
chen | 10 |
developed | 10 |
languages | 10 |
medicine | 10 |
more | 10 |
number | 10 |
prompting | 10 |
score | 10 |
settings | 10 |
tasks | 10 |
tokens | 10 |
adaptation | 9 |
com | 9 |
compared | 9 |
conference | 9 |
corpus | 9 |
full | 9 |
generally | 9 |
improvements | 9 |
information | 9 |
jmmlu | 9 |
knowledge | 9 |
method | 9 |
mmlu | 9 |
one | 9 |
practical | 9 |
research | 9 |
sukeda | 9 |
task | 9 |
text | 9 |
two | 9 |
wei | 9 |
窶 | 9 |
available | 8 |
cot | 8 |
e | 8 |
for | 8 |
further | 8 |
hand | 8 |
improvement | 8 |
its | 8 |
jp | 8 |
medical | 8 |
palm | 8 |
parameters | 8 |
questions | 8 |
response | 8 |
results | 8 |
series | 8 |
specific | 8 |
specifically | 8 |
studies | 8 |
translated | 8 |
work | 8 |
8 |
$ docker run -v /Users/ogawakiyoshi/llm:/tmp/llm -it kaizenjapan/llm /bin/bash
# tr 'A-Z' 'a-z' < 2411.03590v1.txt > 2411.03590v1s.txt
# awk -f wc.awk 2411.03590v1s.txt > 2411.03590v1.wc
# tr 'A-Z' 'a-z' < 2409.11783v2.txt > 2409.11783v2s.txt
# awk -f wc.awk 2409.11783v2s.txt > 2409.11783v2.wc
$ docker commit c6430640806f kaizenjapan/llm
$ docker push kaizenjapan/llm