References were arranged alphabetically to account for duplication.
One of the main thrusts of this collection of papers is Open AI.
ChatGPT推しへの道
本から始める 面倒なことはChatGPTにやらせよう by からあげ
https://qiita.com/kaizen_nagoya/items/f5ce2a18db54b5610e4b
MCP入門 〜面倒なことはAIエージェントにやらせよう〜 by からあげ を聞きながら
https://qiita.com/kaizen_nagoya/items/54b648c838fae8d57e38
【松尾研LLMコミュニティ】面倒なことはLLMにやらせよう "Beginning LLM"2024年10月17日 AI(9)
https://qiita.com/kaizen_nagoya/items/efdc23fbe67cdae2126e
設計:ChatGPTで特異解か一般解を求める AI(1)
https://qiita.com/kaizen_nagoya/items/4dec580e16a7c84b0ec4
みんなの使い方から学ぶ
https://qiita.com/kaizen_nagoya/items/8e7de492d896df94a36e
AI・機械学習 昨日、今日、明日
https://qiita.com/kaizen_nagoya/items/adb184c8fc7a65ac9756
DNA LLM and genome for survey 2200 papers by name.
https://qiita.com/kaizen_nagoya/items/ce8a28d6072f340a9d59
DNA LLM
for survey 2000 papers.
https://qiita.com/kaizen_nagoya/items/d528200aa52766a51b30
No. | S | R | R.R | title and URL | new or qiita |
---|---|---|---|---|---|
656 | 13 | 11 | a foundation model for chest x-ray interpretation. arXiv preprint arXiv:2401.12208, 2024a. | ||
2058 | 35 | 331 | A Venigalla, Jonathan Frankle, and M Carbin. 2022. Biomedlm: a domain-specific large language model for biomedical text. MosaicML. Accessed: Dec 23, (2022), 2. | ||
215 | 5 | 28 | A. A. Egorov and G. C. Atkinson. Lovis4u: Locus visualisation tool for comparative genomics. bioRxiv, 2024. doi: 10.1101/2024.09.11.612399. URL https://www.biorxiv.org/content/early/2024/09/14/2024.09.11.612399. | ||
213 | 5 | 26 | A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan, et al. The Llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. | ||
547 | 11 | 5 | A. Dubey, A. Jauhri, A. Pandey, A. Kadian, A. Al-Dahle, A. Letman, A. Mathur, A. Schelten, A. Yang, A. Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024. | ||
1131 | 24 | 14 | A. Elnaggar, M. Heinzinger, C. Dallago, G. Rehawi, Y. Wang, L. Jones, T. Gibbs, T. Feher, C. Angerer, M. Steinegger, et al. Prottrans: Toward understanding the language of life through self-supervised learning. IEEE transactions on pattern analysis and machine intelligence, 44(10):7112–7127, 2021. | ||
645 | 13 | 0 | A. Fallahpour, J. Ma, A. Munim, H. Lyu, and B. Wang. Medrax: Medical reasoning agent for chest x-ray, 2025. URL https://arxiv.org/abs/2502.02673. | ||
583 | 12 | 0 | A. Fallahpour, V. Gureghian, G. J. Filion, A. B. Lindner, and A. Pandi. Codontransformer: A multispecies codon optimizer using context-aware neural networks. Nature Communications, 16(1), Apr 2025. doi: 10.1038/s41467-025-58588-7. | ||
1232 | 25 | 56 | A. Ghafarollahi and M. J. Buehler, “Atomagents: Alloy design and discovery through physics-aware multi-modal multi-agent artificial intelligence,” 2024. | ||
1141 | 24 | 24 | A. Gu, K. Goel, and C. Ré. Efficiently modeling long sequences with structured state spaces. arXiv preprint arXiv:2111.00396, 2021. | ||
1369 | 28 | 21 | A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov. Fasttext. zip: Compressing text classification models. arXiv preprint arXiv:1612.03651, 2016. | ||
269 | 5 | 84 | A. K. Pathak, N. Bora, M. Badonyi, B. J. Livesey, S. Consortium, J. Ngeow, and J. A. Marsh. Pervasive ancestry bias in variant effect predictors. bioRxiv, pages 2024–05, 2024. | ||
558 | 11 | 16 | A. Kumar, V. Zhuang, R. Agarwal, Y. Su, J. D. Co-Reyes, A. Singh, K. Baumli, S. Iqbal, C. Bishop, R. Roelofs, et al. Training language models to self-correct via reinforcement learning. arXiv preprint arXiv:2409.12917, 2024. | ||
257 | 5 | 70 | A. L. Mitchell, A. Almeida, M. Beracochea, M. Boland, J. Burgin, G. Cochrane, M. R. Crusoe, V. Kale, S. C. Potter, L. J. Richardson, E. Sakharova, M. Scheremetjew, A. Korobeynikov, A. Shlemov, O. Kunyavskaya,A. Lapidus, and R. D. Finn. MGnify: the microbiome analysis resource in 2020. Nucleic Acids Res., 48(D1):D570–D578, Jan. 2020. | ||
1372 | 28 | 24 | A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, et al. Solving quantitative reasoning problems with language models. Advances in Neural Information Processing Systems, 35:3843–3857, 2022a. | ||
1373 | 28 | 25 | A. Lewkowycz, A. Andreassen, D. Dohan, E. Dyer, H. Michalewski, V. V. Ramasesh, A. Slone, C. Anil, I. Schlag, T. Gutman-Solo, Y. Wu, B. Neyshabur, G. Gur-Ari, and V. Misra. Solving quantitative reasoning problems with language models. In S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh, editors, Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems 2022, NeurIPS 2022, New Orleans, LA, USA, November 28 - December 9, 2022, 2022b. URL http://papers.nips.cc/paper_files/paper/2022/hash/18abbeef8cfe9203fdf9053c9c4fe191-Abstract-Conference.html. | ||
243 | 5 | 56 | A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. DeepSeek-v3 technical report. arXiv preprint arXiv:2412.19437, 2024. | ||
245 | 5 | 58 | A. Lozhkov, R. Li, L. B. Allal, F. Cassano, J. Lamy-Poirier, N. Tazi, A. Tang, D. Pykhtar, J. Liu, Y. Wei, et al. Starcoder 2 and the stack v2: The next generation. arXiv preprint arXiv:2402.19173, 2024. | ||
1148 | 24 | 31 | A. Madani, B. Krause, E. R. Greene, S. Subramanian, B. P. Mohr, J. M. Holton, J. L. Olmos Jr, C. Xiong, Z. Z. Sun, R. Socher, et al. Large language models generate functional protein sequences across diverse families. Nature Biotechnology, pages 1–8, 2023. | ||
246 | 5 | 59 | A. Makhzani and B. Frey. k-sparse autoencoders. arXiv, 2014. URL https://arxiv.org/abs/1312.5663. | ||
551 | 11 | 9 | A. P. Gema, J. O. J. Leang, G. Hong, A. Devoto, A. C. M. Mancino, R. Saxena, X. He, Y. Zhao, X. Du, M. R. G. Madani, C. Barale, R. McHardy, J. Harris, J. Kaddour, E. van Krieken, and P. Minervini. Are we done with mmlu? CoRR, abs/2406.04127, 2024. URL https://doi.org/10.48550/arXiv.2406.04127. | ||
268 | 5 | 83 | A. Patel, A. Singhal, A. Wang, A. Pampari, M. Kasowski, and A. Kundaje. DART-Eval: A comprehensive DNA language model evaluation benchmark on regulatory DNA. Advances in Neural Information Processing Systems, 37:62024–62061, 2024. | ||
1368 | 28 | 20 | A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. d. l. Casas, F. Bressand, G. Lengyel, G. Lample, L. Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023. | ||
1367 | 28 | 19 | A. Q. Jiang, S. Welleck, J. P. Zhou, W. Li, J. Liu, M. Jamnik, T. Lacroix, Y. Wu, and G. Lample. Draft, sketch, and prove: Guiding formal theorem provers with informal proofs. arXiv preprint arXiv:2210.12283, 2022. | ||
277 | 5 | 92 | A. R. Quinlan and I. M. Hall. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6):841–842, Mar. 2010. | ||
1458 | 31 | 9 | A. Radford and Karthik Narasimhan. Improving language understanding by generative pre-training. 2018. | ||
1156 | 24 | 39 | A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021. | ||
278 | 5 | 93 | A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever, et al. Language models are unsupervised multitask learners. OpenAI blog, 1(8):9, 2019. | ||
1454 | 31 | 5 | A. Radford, Jeffrey Wu, R. Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. 2019. | ||
1157 | 24 | 40 | A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021. | ||
281 | 5 | 96 | A. Rives, J. Meier, T. Sercu, S. Goyal, Z. Lin, J. Liu, D. Guo, M. Ott, C. L. Zitnick, J. Ma, and R. Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 118:e2016239118, 2021. ISSN 0027-8424. doi:10.1073/pnas.2016239118. | ||
254 | 5 | 67 | A. T. Merchant, S. H. King, E. Nguyen, and B. L. Hie. Semantic mining of functional de novo genes from a genomic language model. bioRxiv, 2024. doi: 10.1101/2024.12.17.628962. URL https://www.biorxiv.org/content/early/2024/12/18/2024.12.17.628962. | ||
1200 | 25 | 24 | A. Tamkin, A. Askell, L. Lovitt, E. Durmus, N. Joseph, S. Kravec, K. Nguyen, J. Kaplan, and D. Ganguli, “Evaluating and mitigating discrimination in language model decisions,” 2023. | ||
1194 | 25 | 18 | A. Tamkin, M. Brundage, J. Clark, and D. Ganguli, “Understanding the capabilities, limitations, and societal impact of large language models,” 2021. | ||
296 | 5 | 111 | A. Tareen and J. B. Kinney. Logomaker: beautiful sequence logos in python. Bioinformatics, 36(7):2272–2274, 12 2019. ISSN 1367-4803. doi: 10.1093/bioinformatics/btz921. URL https://doi.org/10.1093/bioinformatics/btz921. | ||
298 | 5 | 113 | A. Templeton, T. Conerly, J. Marcus, J. Lindsey, T. Bricken, B. Chen, A. Pearce, C. Citro, E. Ameisen, A. Jones, H.Cunningham, N.L.Turner, C.McDougall, M.MacDiarmid, C.D.Freeman, T.R.Sumers, E.Rees, J.Batson, A. Jermyn, S. Carter, C. Olah, and T. Henighan. Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet. Transformer Circuits Thread, 2024. URL https://transformer-circuits.pub/2024/scaling-monosemanticity/index.html. | ||
301 | 5 | 116 | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin. Attention is all you need. Advances in Neural Information Processing Systems, 2017. | ||
1168 | 24 | 51 | A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. | ||
299 | 5 | 114 | A. W. Thomas, R. Parnichkun, A. Amini, S. Massaroli, and M. Poli. STAR: Synthesis of tailored architectures. arXiv, 2411.17800, 2024. | ||
1544 | 33 | 0 | A. Yang, B. Yang, B. Hui, B. Zheng, B. Yu, C. Zhou, C. Li, C. Li, D. Liu, F. Huang, G. Dong, H. Wei, H. Lin, J. Tang, J. Wang, J. Yang, J. Tu, J. Zhang, J. Ma, J. Xu, J. Zhou, J. Bai, J. He, J. Lin, K. Dang, K. Lu, K. Chen, K. Yang, M. Li, M. Xue, N. Ni, P. Zhang, P. Wang, R. Peng, R. Men, R. Gao, R. Lin, S. Wang, S. Bai, S. Tan, T. Zhu, T. Li, T. Liu, W. Ge, X. Deng, X. Zhou, X. Ren, X. Zhang, X. Wei, X. Ren, Y. Fan, Y. Yao, Y. Zhang, Y. Wan, Y. Chu, Y. Liu, Z. Cui, Z. Zhang, and Z. Fan. Qwen2 technical report. arXiv preprint arXiv:2407.10671, 2024.https://arxiv.org/pdf/2407.10671 | ||
1625 | 34 | 0 | A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Xia, X. Ren, X. Ren, Y. Fan, Y. Su, Y. Zhang, Y. Wan, Y. Liu, Z. Cui, Z. Zhang, and Z. Qiu. Qwen2.5 technical report. arXiv preprint arXiv:2412.15115, 2024.https://arxiv.org/pdf/2412.15115 | ||
1786 | 35 | 59 | Aakanksha Chowdhery, Sharan Narang, Jacob Devlin, Maarten Bosma, Gaurav Mishra, Adam Roberts, Paul Barham, Hyung Won Chung, Charles Sutton, Sebastian Gehrmann, et al. 2023. Palm: Scaling language modeling with pathways. Journal of Machine Learning Research 24, 240 (2023), 1–113. | ||
1261 | 26 | 24 | Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aur´ elien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Rozi` ere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, David Esiobu, Dhruv Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Frank Zhang, | ||
1649 | 34 | 24 | Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aur´ elien Rodriguez, Austen Gregerson, Ava Spataru, Baptiste Rozi` ere, Bethany Biron, Binh Tang, Bobbie Chern, Charlotte Caucheteux, Chaya Nayak, Chloe Bi, Chris Marra, Chris McConnell, Christian Keller, Christophe Touret, Chunyang Wu, Corinne Wong, Cristian Canton Ferrer, Cyrus Nikolaidis, Damien Allonsius, Daniel Song, Danielle Pintz, Danny Livshits, David Esiobu, Dhruv | ||
11 | 3 | 3 | Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S. et al. (2023). GPT-4 technical report. arXiv preprint arXiv:2303.08774. https://arxiv.org/abs/2303.08774. | ||
1058 | 23 | 4 | Adam Auton, Gonc¸alo R. Abecasis, David M. Altshuler, Richard M. Durbin, Gonc¸alo R. Abecasis, David R. Bentley, Aravinda Chakravarti, Andrew G. Clark, Peter Donnelly, Evan E. Eichler, Paul Flicek, Stacey B. Gabriel, Richard A. Gibbs, Eric D. Green, Matthew E. Hurles, Bartha M. Knoppers, Jan O. Korbel, Eric S. Lander, Charles Lee, Hans Lehrach, Elaine R. Mardis, Gabor T. Marth, Gil A. McVean, Deborah A. Nickerson, Jeanette P. Schmidt, Stephen T. Sherry, Jun Wang, Richard K. Wilson, Richard A. Gibbs, Eric Boerwinkle, Harsha Doddapaneni, Yi Han, Viktoriya Korchina, Christie Kovar, Sandra Lee, Donna Muzny, Jeffrey G. Reid, Yiming Zhu, Jun Wang, Yuqi Chang, Qiang Feng, Xiaodong Fang, Xiaosen Guo, Min Jian, Hui Jiang, Xin Jin, Tianming Lan, Guoqing Li, Jingxiang Li, Yingrui Li, Shengmao Liu, Xiao Liu, Yao Lu, Xuedi Ma, Meifang Tang, Bo Wang, Guangbiao Wang, Honglong Wu, Renhua Wu, Xun Xu, Ye Yin, Dandan Zhang, Wenwei Zhang, Jiao Zhao, Meiru Zhao, Xiaole Zheng, Eric S. Lander, David M. Altshuler, Stacey B. Gabriel, Namrata Gupta, Neda Gharani, Lorraine H. Toji, Norman P. Gerry, Alissa M. Resch, Paul Flicek, Jonathan Barker, Laura Clarke, Laurent Gil, Sarah E. Hunt, Gavin Kelman, Eugene Kulesha, Rasko Leinonen, William M. McLaren, Rajesh Radhakrishnan, Asier Roa, Dmitriy Smirnov, Richard E. Smith, Ian Streeter, Anja Thormann, Iliana Toneva, Brendan Vaughan, Xiangqun Zheng-Bradley, David R. Bentley, Russell Grocock, Sean Humphray, Terena James, Zoya Kingsbury, Hans Lehrach, Ralf Sudbrak, Marcus W. Albrecht, Vyacheslav S. Amstislavskiy, Tatiana A. Borodina, Matthias Lienhard, Florian Mertes, Marc Sultan, Bernd Timmermann, Marie-Laure Yaspo, Elaine R. Mardis, Richard K. Wilson, Lucinda Fulton, Robert Fulton, Stephen T. Sherry, Victor Ananiev, Zinaida Belaia, Dimitriy Beloslyudtsev, Nathan Bouk, Chao Chen, Deanna Church, Robert Cohen, Charles Cook, John Garner, Timothy Hefferon, Mikhail Kimelman, Chunlei Liu, John Lopez, Peter Meric, Chris O’Sullivan, Yuri Ostapchuk, Lon Phan, Sergiy Ponomarov, Valerie Schneider, Eugene Shekhtman, Karl Sirotkin, Douglas Slotta, Hua Zhang, Gil A. McVean, Richard M. Durbin, Senduran Balasubramaniam, John Burton, Petr Danecek, Thomas M. Keane, Anja Kolb-Kokocinski, Shane McCarthy, James Stalker, Michael Quail, Jeanette P. Schmidt, Christopher J. Davies, Jeremy Gollub, Teresa Webster, Brant Wong, Yiping Zhan, Adam Auton, Christopher L. Campbell, Yu Kong, Anthony Marcketta, Richard A. Gibbs, Fuli Yu, Lilian Antunes, Matthew Bainbridge, Donna Muzny, Aniko Sabo, Zhuoyi Huang, Published as a conference paper at ICLR 2024 | ||
1071 | 23 | 17 | Adam Frankish, Mark Diekhans, Irwin Jungreis, Julien Lagarde, Jane E Loveland, Jonathan M Mudge, Cristina Sisu, James C Wright, Joel Armstrong, If Barnes, Andrew Berry, Alexandra Bignell, Carles Boix, Silvia Carbonell Sala, Fiona Cunningham, Tom´ as Di Domenico, Sarah Donaldson, Ian T Fiddes, Carlos Garc´ ıa Gir´ on, Jose Manuel Gonzalez, Tiago Grego, Matthew Hardy, Thibaut Hourlier, Kevin L Howe, Toby Hunt, Osagie G Izuogu, Rory Johnson, Fergal J Martin, Laura Mart´ ınez, Shamika Mohanan, Paul Muir, Fabio C P Navarro, Anne Parker, Baikang Pei, Fernando Pozo, Ferriol Calvet Riera, Magali Ruffier, Bianca M Schmitt, Eloise Stapleton, Marie-Marthe Suner, Irina Sycheva, Barbara Uszczynska-Ratajczak, Maxim Y Wolf, Jinuri Xu, Yucheng T Yang, Andrew Yates, Daniel Zerbino, Yan Zhang, Jyoti S Choudhary, Mark Gerstein, Roderic Guig´ o, Tim J P Hubbard, Manolis Kellis, Benedict Paten, Michael L Tress, and Paul Flicek. GENCODE 2021. Nucleic Acids Research, 49(D1):D916–D923, January 2021. ISSN 0305-1048. doi: 10.1093/nar/gkaa1087. URL https://doi.org/10.1093/nar/gkaa1087. | ||
646 | 13 | 1 | Adibi, A., Cao, X., Ji, Z., Kaur, J. N., Chen, W., Healey, E., Nuwagira, B., Ye, W., Woollard, G., Xu, M. A., Cui, H., Xi, J., Chang, T., Bikia, V., Zhang, N., Noori, A., Xia, Y., Hossain, M. B., Frank, H. A., Peluso, A., Pu, Y., Shen, S. Z., Wu, J., Fallahpour, A., Mahbub, S., Duncan, R., Zhang, Y., Cao, Y., Xu, Z., Craig, M., Krishnan, R. G., Beheshti, R., Rehg, J. M., Karim, M. E., Coffee, M., Celi, L. A., Fries, J. A., Sadatsafavi, M., Shung, D., McWeeney, S., Dafflon, J., and Jabbour, S. Recent advances, applications and open challenges in machine learning for health: Reflections from research roundtables at ml4h 2024 symposium, 2025. | ||
792 | 15 | 56 | Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pp. 1112–1122, New Orleans, Louisiana, June 2018. Association for Computational Linguistics. doi: 10.18653/v1/N18-1101. URL https://www.aclweb.org/anthology/N18-1101. | ||
1487 | 31 | 38 | Adina Williams, Nikita Nangia, and Samuel Bowman. A broad-coverage challenge corpus for sentence understanding through inference. pages 1112–1122, 01 2018. doi:10.18653/v1/N18-1101. | ||
1951 | 35 | 224 | Aditya Malusare, Harish Kothandaraman, Dipesh Tamboli, Nadia A. Lanman, and Vaneet Aggarwal. 2023. Understanding the Natural Language of DNA using Encoder-Decoder Foundation Models with Byte-level Precision. arXiv:2311.02333 [cs.LG] | ||
1800 | 35 | 73 | adsorption via spatial atom interaction learning. Nature Communications 14, 1 (2023), 7043. | ||
174 | 4 | 34 | Agarwal, I., Fuller, Z. L., Myers, S. R. & Przeworski, M. Relating pathogenic loss-of-function mutations in humans to their evolutionary fitness costs. eLife 12, e83172 (2023). | ||
439 | 9 | 17 | Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020). | ||
499 | 10 | 17 | Agarwal, V. & Shendure, J. Predicting mRNA abundance directly from genomic sequence using deep convolutional neural networks. Cell Rep. 31, 107663 (2020). | ||
1817 | 35 | 90 | Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, and Burkhard Rost. 2022. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 44, 10 (2022), 7112–7127. | ||
1068 | 23 | 14 | Ahmed Elnaggar, Michael Heinzinger, Christian Dallago, Ghalia Rehawi, Yu Wang, Llion Jones, Tom Gibbs, Tamas Feher, Christoph Angerer, Martin Steinegger, Debsindhu Bhowmik, and Burkhard Rost. ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(10):7112–7127, October 2022. ISSN 0162-8828, 2160-9292, 1939-3539. doi: 10.1109/TPAMI.2021.3095381. URL https://ieeexplore.ieee.org/document/9477085/. | ||
647 | 13 | 2 | Ahn, J. S., Ebrahimian, S., McDermott, S., Lee, S., Naccarato, L., Di Capua, J. F., Wu, M. Y., Zhang, E. W., Muse, V., Miller, B., et al. Association of artificial intelligence–aided chest radiograph interpretation with reader performance and efficiency. JAMA Network Open, 5(8):e2229289–e2229289, 2022. | ||
1546 | 33 | 2 | AI@Meta. Llama 3 model card, 2024. URL https://github.com/meta-llama/llama3/blob/main/MODEL_CARD.md. | ||
543 | 11 | 1 | AI@Meta. Llama 3.1 model card, 2024. URL https://github.com/meta-llama/llama-models/blob/main/models/llama3_1/MODEL_CARD.md. | ||
1047 | 22 | 130 | Ainslie J, Lee-Thorp J, de Jong M. et al. GQA: training generalized multi-query transformer models from multi-headcheckpoints. arXiv preprint arXiv:2305.13245, 2023. | ||
1281 | 26 | 44 | Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, L´ elio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Th´ eophile Gervet, Thibaut Lavril, Thomas Wang, Timoth´ ee Lacroix, and William El Sayed. Mixtral of experts. CoRR, abs/2401.04088, 2024a. | ||
1576 | 33 | 32 | Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, L´ elio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Th´ eophile Gervet, Thibaut Lavril, Thomas Wang, Timoth´ ee Lacroix, and William El Sayed. Mixtral of experts. CoRR, abs/2401.04088, 2024. | ||
1668 | 34 | 43 | Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, L´ elio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Th´ eophile Gervet, Thibaut Lavril, Thomas Wang, Timoth´ ee Lacroix, and William El Sayed. Mixtral of experts. CoRR, abs/2401.04088, 2024a. | ||
1280 | 26 | 43 | Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L´ elio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth´ee Lacroix, and William El Sayed. Mistral 7B. CoRR, abs/2310.06825, 2023a. | ||
1575 | 33 | 31 | Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L´ elio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth´ ee Lacroix, and William El Sayed. Mistral 7B. CoRR, abs/2310.06825, 2023a. | ||
1667 | 34 | 42 | Albert Q. Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, L´ elio Renard Lavaud, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timoth´ee Lacroix, and William El Sayed. Mistral 7B. CoRR, abs/2310.06825, 2023a. | ||
168 | 4 | 28 | Albuisson, J. et al. Identification of two novel mutations in Shh long-range regulator associated with familial pre-axial polydactyly. Clin. Genet. 79, 371–377 (2011). | ||
781 | 15 | 45 | Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language Models are Unsupervised Multitask Learners. pp. 24, b. | ||
1994 | 35 | 267 | Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog 1, 8 (2019), 9. | ||
780 | 15 | 44 | Alec Radford, Karthik Narasimhan, Tim Salimans, and Ilya Sutskever. Improving Language Understanding by Generative Pre-Training. pp. 12, a. | ||
1993 | 35 | 266 | Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. 2018. Improving language understanding by generative pre-training. (2018). Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog (2019), 9. | ||
1308 | 26 | 71 | Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. Technical report, OpenAI, 2018. | ||
1694 | 34 | 69 | Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever, et al. Improving language understanding by generative pre-training. Technical report, OpenAI, 2018. | ||
1265 | 26 | 28 | Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton A. Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, Alexander Panchenko, and Sergey Markov. MERA: A comprehensive LLM evaluation in russian. CoRR, abs/2401.04531, 2024. | ||
1568 | 33 | 24 | Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton A. Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, Alexander Panchenko, and Sergey Markov. MERA: A comprehensive LLM evaluation in russian. CoRR, abs/2401.04531, 2024. | ||
1652 | 34 | 27 | Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton A. Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, Alexander Panchenko, and Sergey Markov. MERA: A comprehensive LLM evaluation in russian. CoRR, abs/2401.04531, 2024. | ||
2046 | 35 | 319 | Alessandra Toniato, Alain C Vaucher, Philippe Schwaller, and Teodoro Laino. 2023. Enhancing diversity in language based models for single-step retrosynthesis. Digital Discovery 2, 2 (2023), 489–501. | ||
789 | 15 | 53 | Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Glue: A multi-task benchmark and analysis platform for natural language understanding, 2019. | ||
790 | 15 | 54 | Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Superglue: A stickier benchmark for general-purpose language understanding systems, 2020. | ||
791 | 15 | 55 | Alex Warstadt, Amanpreet Singh, and Samuel R Bowman. Neural network acceptability judgments. arXiv preprint arXiv:1805.12471, 2018. | ||
1332 | 26 | 95 | Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, and Zonghong Dai. Yi: Open foundation models by 01.AI. CoRR, abs/2403.04652, 2024. | ||
1617 | 33 | 73 | Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, and Zonghong Dai. Yi: Open foundation models by 01.AI. CoRR, abs/2403.04652, 2024. | ||
1718 | 34 | 93 | Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, and Zonghong Dai. Yi: Open foundation models by 01.AI. CoRR, abs/2403.04652, 2024. | ||
1900 | 35 | 173 | Alexander Lachmann, Denis Torre, Alexandra B Keenan, Kathleen M Jagodnik, Hoyjin J Lee, Lily Wang, Moshe C Silverstein, and Avi Ma’ayan. 2018. Massive mining of publicly available RNA-seq data from human and mouse. Nature communications 9, 1 (2018), 1366. | ||
2003 | 35 | 276 | Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. 2019. Biological Structure and Function Emerge from Scaling Unsupervised Learning to 250 Million Protein Sequences. PNAS (2019). | ||
1102 | 23 | 48 | Alexander Rives, Joshua Meier, Tom Sercu, Siddharth Goyal, Zeming Lin, Jason Liu, Demi Guo, Myle Ott, C. Lawrence Zitnick, Jerry Ma, and Rob Fergus. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. preprint, Synthetic Biology, April 2019. URL http://biorxiv.org/lookup/doi/10.1101/622803. | ||
1434 | 30 | 17 | Alexandrov L.B., Kim J., Haradhvala N.J., Huang M.N., Tian Ng A.W., Wu Y., Boot A., Covington K.R., Gordenin D.A., Bergstrom E.N.et al. . The repertoire of mutational signatures in human cancer. Nature. 2020; 578:94–101. | ||
1091 | 23 | 37 | Ali Madani, Ben Krause, Eric R. Greene, Subu Subramanian, Benjamin P. Mohr, James M. Holton, Jose Luis Olmos, Caiming Xiong, Zachary Z. Sun, Richard Socher, James S. Fraser, and Nikhil Naik. Large language models generate functional protein sequences across diverse families. Nature Biotechnology, January 2023. ISSN 1087-0156, 1546-1696. doi: 10.1038/s41587-022-01618-2. URL https://www.nature.com/articles/s41587-022-01618-2. | ||
1950 | 35 | 223 | Ali Madani, Bryan McCann, Nikhil Naik, Nitish Shirish Keskar, Namrata Anand, Raphael R. Eguchi, Po-Ssu Huang, and Richard Socher. 2020. ProGen: Language Modeling for Protein Generation. arXiv e-prints (March 2020), 2004.03497. | ||
976 | 22 | 59 | Alipanahi B, Delong A, Weirauch MT. et al. Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nat Biotechnol 2015;33:831–8. 10.1038/nbt.3300. | ||
47 | 3 | 39 | Alipanahi, B., Delong, A., Weirauch, M. T., and Frey, B. J. (2015). Predicting the sequence specificities of DNA-and RNA-binding proteins by deep learning. Nature Biotechnology 33, 831–838. | ||
1881 | 35 | 154 | Alistair E. W. Johnson, Lucas Bulgarelli, Lu Shen, Alvin Gayles, Ayad Shammout, Steven Horng, Tom J. Pollard, Benjamin Moody, Brian Gow, Li wei H. Lehman, Leo Anthony Celi, and Roger G. Mark. 2023. MIMIC-IV, a freely accessible electronic health record dataset. Scientific Data 10 (2023). https://api.semanticscholar.org/CorpusID:255439889 | ||
1883 | 35 | 156 | Alistair E. W. Johnson, Tom J. Pollard, Lu Shen, Li wei H. Lehman, Mengling Feng, Mohammad Mahdi Ghassemi, Benjamin Moody, Peter Szolovits, Leo Anthony Celi, and Roger G. Mark. 2016. MIMIC-III, a freely accessible critical care database. Scientific Data 3 (2016). https://api.semanticscholar.org/CorpusID:33285731 | ||
1882 | 35 | 155 | Alistair E. W. Johnson, Tom J. Pollard, Seth J. Berkowitz, Nathaniel R. Greenbaum, Matthew P. Lungren, Chih ying Deng, Roger G. Mark, and Steven Horng. 2019. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Scientific Data 6 (2019). https://api.semanticscholar.org/CorpusID:209342303 | ||
1880 | 35 | 153 | Alistair Johnson, Tom Pollard, Steven Horng, Leo Anthony Celi, and Roger Mark. 2023. MIMIC-IV-Note: Deidentified free-text clinical notes (version 2.2). PhysioNet. https://doi.org/10.13026/1n74-ne17 | ||
403 | 8 | 4 | Alonso, M.E., Pernaute, B., Crespo, M., Gómez-Skarmeta, J.L. & Manzanares, M. Understanding the regulatory genome. Int. J. Develop. Biol. https://ijdb.ehu.eus/article/072428ma (2009). | ||
879 | 21 | 1 | Alsentzer E. et al. (2019) Publicly available clinical bert embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, Minneapolis, MN, USA. pp. 72–78. Association for Computational Linguistics. https://www.aclweb.org/anthology/W19-1909. | ||
815 | 17 | 1 | Altschul SF, Gish W, Miller W et al. Basic local alignment search tool. J Mol Biol 1990;215:403–10. | ||
1520 | 32 | 27 | Alves, V. M., Muratov, E., Fourches, D., Strickland, J., Kleinstreuer, N., Andrade, C. H. & Tropsha, A. Predicting chemically-induced skin reactions. Part I: QSAR models of skin sensitization and their application to identify potentially hazardous compounds. Toxicology and applied pharmacology 284, 262–272 (2015). | ||
1473 | 31 | 24 | Amapreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel Bowman. Glue: A multi-task benchmark and analysis platform for natural language understanding. 04 2018. | ||
125 | 3 | 117 | Amberger, J. S., Bocchini, C. A., Schiettecatte, F., Scott, A. F., and Hamosh, A. (2015). OMIM.org: Online Mendelian Inheritance in Man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Research 43, D789–D798. | ||
2044 | 35 | 317 | Amol Thakkar, Alain C Vaucher, Andrea Byekwaso, Philippe Schwaller, Alessandra Toniato, and Teodoro Laino. 2023. Unbiasing retrosynthesis language models with disconnection prompts. ACS Central Science 9, 7 (2023), 1488–1498. | ||
1742 | 35 | 15 | Amos Bairoch and Rolf Apweiler. 2000. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic acids research 28, 1 (2000), 45–48. | ||
1328 | 26 | 91 | An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, and Zhihao Fan. Qwen2 technical report. CoRR, abs/2407.10671, 2024a. | ||
1714 | 34 | 89 | An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jianxin Yang, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Xuejing Liu, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, Zhifang Guo, and Zhihao Fan. Qwen2 technical report. CoRR, abs/2407.10671, 2024a. | ||
1329 | 26 | 92 | An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, et al. Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement. CoRR, abs/2409.12122, 2024b. | ||
1715 | 34 | 90 | An Yang, Beichen Zhang, Binyuan Hui, Bofei Gao, Bowen Yu, Chengpeng Li, Dayiheng Liu, Jianhong Tu, Jingren Zhou, Junyang Lin, et al. Qwen2.5-Math technical report: Toward mathematical expert model via self-improvement. CoRR, abs/2409.12122, 2024b. | ||
336 | 7 | 24 | An, J. Y. et al. Genome-wide de novo risk score implicates promoter variation in autism spectrum disorder. Science 362, eaat6576 (2018). | ||
339 | 7 | 27 | Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014). | ||
784 | 15 | 48 | Andreas R¨ uckl´ e, Gregor Geigle, Max Glockner, Tilman Beck, Jonas Pfeiffer, Nils Reimers, and Iryna Gurevych. Adapterdrop: On the efficiency of adapters in transformers, 2020. | ||
1758 | 35 | 31 | Andres M Bran and Philippe Schwaller. 2023. Transformers and Large Language Models for Chemistry and Drug Discovery. arXiv:2310.06083 [cs.LG] | ||
1757 | 35 | 30 | Andres M Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. 2023. ChemCrow: Augmenting large-language models with chemistry tools. arXiv:2304.05376 [physics.chem-ph] | ||
1960 | 35 | 233 | Andrew G. McDonald, Sinéad Boyce, and Keith F. Tipton. 2008. ExplorEnz: the primary source of the IUBMB enzyme list. Nucleic Acids Research 37, suppl_1 (09 2008), 593–597. | ||
1061 | 23 | 7 | Andrew J Bannister and Tony Kouzarides. Regulation of chromatin by histone modifications. Cell Research, 21(3):381–395, March 2011. ISSN 1748-7838. doi: 10.1038/cr.2011.22. URL https://doi.org/10.1038/cr.2011.22. | ||
1898 | 35 | 171 | Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, and John Moult. 2021. Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins: Structure, Function, and Bioinformatics 89, 12 (2021), 1607–1617. | ||
1084 | 23 | 30 | Andriy Kryshtafovych, Torsten Schwede, Maya Topf, Krzysztof Fidelis, and John Moult. Critical assessment of methods of protein structure prediction (CASP)—Round XIV. Proteins: Published as a conference paper at ICLR 2024 | ||
1471 | 31 | 22 | Angelos Katharopoulos, Apoorv Vyas, Nikolaos Pappas, and François Fleuret. Transformers are rnns: Fast autoregressive transformers with linear attention. In International Conference on Machine Learning, pages 5156–5165. PMLR, 2020. | ||
603 | 12 | 20 | Angov, E., Legler, P. M. & Mease, R. M. Adjustment of codon usage frequencies by codon harmonization improves protein expression and folding. Methods Mol. Biol. 705, 1–13 (2011). | ||
1980 | 35 | 253 | Ankit Pal, Logesh Kumar Umapathi, and Malaikannan Sankarasubbu. 2022. MedMCQA : A Large-scale Multi-Subject Multi-Choice Dataset for Medical domain Question Answering. arXiv:2203.14371 [cs.CL] | ||
1459 | 31 | 10 | Ankur P. Parikh, Oscar Täckström, Dipanjan Das, and Jakob Uszkoreit. A decomposable attention model for natural language inference. In EMNLP, 2016. | ||
1838 | 35 | 111 | Anna Gaulton, Louisa J Bellis, A Patricia Bento, Jon Chambers, Mark Davies, Anne Hersey, Yvonne Light, Shaun McGlinchey, David Michalovich, Bissan Al-Lazikani, et al. 2012. ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic acids research 40, D1 (2012), D1100–D1107. | ||
544 | 11 | 2 | Anthropic. Claude 3.5 sonnet, 2024. URL https://www.anthropic.com/news/claude-3-5-sonnet. | ||
7 | 2 | 0 | Anthropic. Claude 3.7 sonnet, February 2025. URL https://www.anthropic.com/news/claude-3-7-sonnet. Accessed: 2025-05-13. | https://qiita.com/kaizen_nagoya/items/4364d9c475114353cf2a | |
1243 | 26 | 6 | Anthropic. Introducing Claude, 2023a. URL https://www.anthropic.com/index/introducing-claude. Anthropic. Claude 2. Technical report, Anthropic, 2023b. URL https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf. | ||
1631 | 34 | 6 | Anthropic. Introducing Claude, 2023a. URL https://www.anthropic.com/index/introducing-claude. Anthropic. Claude 2. Technical report, Anthropic, 2023b. URL https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf. | ||
1244 | 26 | 7 | Anthropic. The Claude 3 model family: Opus, Sonnet, Haiku. Technical report, Anthropic, AI, 2024. URL https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model Card Claude 3.pdf. | ||
1632 | 1244 | 34 | 7 | Anthropic. The Claude 3 model family: Opus, Sonnet, Haiku. Technical report, Anthropic, AI, 2024. URL https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model Card Claude 3.pdf. | |
1549 | 1244 | 33 | 5 | Anthropic. The Claude 3 model family: Opus, Sonnet, Haiku. Technical report, Anthropic, AI, 2024. URL https://www-cdn.anthropic.com/de8ba9b01c9ab7cbabf5c33b80b7bbc618857627/Model_Card_Claude_3.pdf. | |
1768 | 35 | 41 | Antje Chang, Lisa Jeske, Sandra Ulbrich, Julia Hofmann, Julia Koblitz, Ida Schomburg, Meina Neumann-Schaal, Dieter Jahn, and Dietmar Schomburg. 2021. BRENDA, the ELIXIR core data resource in 2021: new developments and updates. Nucleic acids research 49, D1 (2021), D498–D508. | ||
1754 | 35 | 27 | Antoine Bordes, Nicolas Usunier, Alberto Garcia-Duran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multi-relational data. Advances in neural information processing systems 26 (2013). | ||
1621 | 33 | 77 | Aohan Zeng, Bin Xu, Bowen Wang, Chenhui Zhang, Da Yin, Diego Rojas, Guanyu Feng, Hanlin Zhao, Hanyu Lai, Hao Yu, Hongning Wang, Jiadai Sun, Jiajie Zhang, Jiale Cheng, Jiayi Gui, Jie Tang, Jing Zhang, Juanzi Li, Lei Zhao, Lindong Wu, Lucen Zhong, Mingdao Liu, Minlie Huang, Peng Zhang, Qinkai Zheng, Rui Lu, Shuaiqi Duan, Shudan Zhang, Shulin Cao, Shuxun Yang, Weng Lam Tam, Wenyi Zhao, Xiao Liu, Xiao Xia, Xiaohan Zhang, Xiaotao Gu, Xin Lv, Xinghan Liu, Xinyi Liu, Xinyue Yang, Xixuan Song, Xunkai Zhang, Yifan An, Yifan Xu, Yilin Niu, Yuantao Yang, Yueyan Li, Yushi Bai, Yuxiao Dong, Zehan Qi, Zhaoyu Wang, Zhen Yang, Zhengxiao Du, Zhenyu Hou, and Zihan Wang. ChatGLM: A family of large language models from GLM-130B to GLM-4 all tools. CoRR, abs/2406.12793, 2024. | ||
2118 | 35 | 391 | Aohan Zeng, Xiao Liu, Zhengxiao Du, Zihan Wang, Hanyu Lai, Ming Ding, Zhuoyi Yang, Yifan Xu, Wendi Zheng, Xiao Xia, Weng Lam Tam, Zixuan Ma, Yufei Xue, Jidong Zhai, Wenguang Chen, Zhiyuan Liu, Peng Zhang, Yuxiao Dong, and Jie Tang. 2023. GLM-130B: An Open Bilingual Pre-trained Model. In The Eleventh International Conference on Learning Representations (ICLR). | ||
1579 | 33 | 35 | Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, and Neil Houlsby. Sparse upcycling: Training mixture-of-experts from dense checkpoints. In ICLR. OpenReview.net, 2023. | ||
170 | 4 | 30 | Arbini, A. A., Pollak, E. S., Bayleran, J. K., High, K. A. & Bauer, K. A. Severe factor VII deficiency due to a mutation disrupting a hepatocyte nuclear factor 4 binding site in the factor VII promoter. Blood 89, 176–182 (1997). | ||
1963 | 35 | 236 | Arindam Mitra, Luciano Del Corro, Shweti Mahajan, Andres Codas, Clarisse Simoes, Sahaj Agrawal, Xuxi Chen, Anastasia Razdaibiedina, Erik Jones, Kriti Aggarwal, Hamid Palangi, Guoqing Zheng, Corby Rosset, Hamed Khanpour, and Ahmed Awadallah. 2023. Orca 2: Teaching Small Language Models How to Reason. arXiv:2311.11045 [cs.AI] | ||
965 | 22 | 48 | Arisdakessian C, Poirion O, Yunits B. et al. DeepImpute: an accurate, fast, and scalable deep neural network method to impute single-cell RNA-seq data. Genome Biol 2019;20:211. 10.1186/s13059-019-1837-6. | ||
737 | 15 | 1 | Armen Aghajanyan, Luke Zettlemoyer, and Sonal Gupta. Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning. arXiv:2012.13255 [cs], December 2020. URL http://arxiv.org/abs/2012.13255. | ||
151 | 4 | 11 | Armstrong, J., Fiddes, I. T., Diekhans, M. & Paten, B. Wholegenome alignment and comparative annotation. Annu. Rev. Anim. Biosci. 7, 41–64 (2019). | ||
112 | 3 | 104 | Armstrong, J., Hickey, G., Diekhans, M., Fiddes, I. T., Novak, A. M., Deran, A., Fang, Q., Xie, D., Feng, S., Stiller, J. et al. (2020). Progressive Cactus is a multiple-genome aligner for the thousand-genome era. Nature 587, 246–251. | ||
1268 | 26 | 31 | Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, et al. Are we done with mmlu? CoRR, abs/2406.04127, 2024. | ||
1655 | 1268 | 34 | 30 | Aryo Pradipta Gema, Joshua Ong Jun Leang, Giwon Hong, Alessio Devoto, Alberto Carlo Maria Mancino, Rohit Saxena, Xuanli He, Yu Zhao, Xiaotang Du, Mohammad Reza Ghasemi Madani, et al. Are we done with mmlu? CoRR, abs/2406.04127, 2024. | |
1024 | 22 | 107 | Ashburner M, Ball C, Blake J. et al. Gene ontology: tool for the unification of biology. Nat Genet 2000;25:25–9. 10.1038/75556. | ||
1452 | 788 | 31 | 3 | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, L ukasz Kaiser, and Illia Polosukhin. Attention is all you need. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017. URL https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf. | |
2198 | 788 | 36 | 38 | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017. | |
788 | 15 | 52 | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000–6010, 2017. | ||
2057 | 788 | 35 | 330 | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is All you Need. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA. 5998–6008. | |
1320 | 26 | 83 | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, pp. 5998–6008, 2017. | ||
1613 | 1320 | 33 | 69 | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, pp. 5998–6008, 2017. | |
1706 | 1320 | 34 | 81 | Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. In NIPS, pp. 5998–6008, 2017. | |
1746 | 35 | 19 | Asma Ben Abacha and Dina Demner-Fushman. 2019. A question-entailment approach to question answering. BMC bioinformatics 20 (2019), 1–23. | ||
1303 | 26 | 66 | Association for Computational Linguistics, 2020. | ||
2116 | 35 | 389 | Atakan Yüksel, Erva Ulusoy, Atabey Ünlü, and Tunca Doğan. 2023. Selformer: Molecular representation learning via selfies language models. Machine Learning: Science and Technology (2023). | ||
2045 | 35 | 318 | Augustin Toma, Patrick R. Lawler, Jimmy Ba, Rahul G. Krishnan, Barry B. Rubin, and Bo Wang. 2023. Clinical Camel: An Open Expert-Level Medical Language Model with Dialogue-Based Knowledge Encoding. arXiv:2305.1203[cs.CL] | ||
2001 | 35 | 274 | Aviv Regev, Sarah A Teichmann, Eric S Lander, Ido Amit, Christophe Benoist, Ewan Birney, Bernd Bodenmiller, Peter Campbell, Piero Carninci, Menna Clatworthy, et al. 2017. The human cell atlas. elife 6 (2017), e27041. | ||
951 | 22 | 34 | Avsec Ž, Agarwal V, Visentin D. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat Methods 2021;18:1196–203. 10.1038/s41592-021-01252-x. | ||
431 | 9 | 9 | Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). | ||
491 | 431 | 10 | 9 | Avsec, Ž. et al. Base-resolution models of transcription-factor binding reveal soft motif syntax. Nat. Genet. 53, 354–366 (2021). | |
157 | 4 | 17 | Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). | ||
441 | 157 | 9 | 19 | Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). | |
501 | 157 | 10 | 19 | Avsec, Ž. et al. Effective gene expression prediction from sequence by integrating long-range interactions. Nat. Methods 18, 1196–1203 (2021). | |
30 | 3 | 22 | Avsec, Z., Agarwal, V., Visentin, D., Ledsam, J. R., Grabska-Barwinska, A., Taylor, K. R., Assael, Y., Jumper, J., Kohli, P., and Kelley, D. R. (2021). Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods 18, 1196–1203. | ||
185 | 4 | 45 | Aw, A. J., McRae, J., Rahmani, E. & Song, Y. S. Highly parameterized polygenic scores tend to overfit to population stratification via random effects. Preprint at bioRxiv https://doi.org/10.1101/2024.01.27.577589 (2024). | ||
393 | 7 | 81 | Azen, R. & Budescu, D. V. The dominance analysis approach for comparing predictors in multiple regression. Psych. Methods 8, 129 (2003). | ||
956 | 22 | 39 | Azher ZL, Suvarna A, Chen JQ. et al. Assessment of emerging pretraining strategies in interpretable multimodal deep learning for cancer prognostication. BioData Min 2023;16:23. 10.1186/s13040-023-00338-w. | ||
1195 | 25 | 19 | B. Buchanan, A. Lohn, M. Musser, and K. Sedova, “Truth, lies, and automation: How language models could change disinformation,” May 2021. | ||
200 | 5 | 14 | B. Bussmann, P. Leask, and N. Nanda. BatchTopK sparse autoencoders. arXiv, 2412.06410, 2024. A. P. Camargo, S. Roux, F. Schulz, M. Babinski, Y. Xu, B. Hu, P. S. G. Chain, S. Nayfach, and N. C. Kyrpides. Identification of mobile genetic elements with genomad. Nat. Biotechnol., 42(8):1303–1312, Aug. 2024. | ||
202 | 5 | 15 | B. Chen, X. Cheng, P. Li, Y.-a. Geng, J. Gong, S. Li, Z. Bei, X. Tan, B. Wang, X. Zeng, et al. xTri-moPGLM:unified100b-scalepre-trainedtransformerfordecipheringthelanguageofprotein. arXiv preprint arXiv:2401.06199, 2024. | ||
265 | 5 | 78 | B. D. Ondov, T. J. Treangen, P. Melsted, A. B. Mallonee, N. H. Bergman, S. Koren, and A. M. Phillippy. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol., 17(1):132, June 2016. | ||
244 | 5 | 57 | B. J. Livesey and J. A. Marsh. Updated benchmarking of variant effect predictors using deep mutational scanning. Molecular Systems Biology, 19, 8 2023. ISSN 1744-4292. doi: 10.15252/msb.202211474. | ||
1144 | 24 | 27 | B. Lester, R. Al-Rfou, and N. Constant. The power of scale for parameter-efficient prompt tuning. arXiv preprint arXiv:2104.08691, 2021. | ||
![]() |
|||||
1757.5 | 35 | 30.5 | B. McGrew, B. Spero, B. Giertler, B. Cheng, B. Lightcap, B. Walkin, B. Quinn, B. Guarraci, B. Hsu, B. Kellogg, B. Eastman, C. Lugaresi, C. Wainwright, C. Bassin, C. Hudson, C. Chu, C. Nelson, C. Li, C. J. Shern, C. Conger, C. Barette, C. Voss, C. Ding, C. Lu, C. Zhang, C. Beaumont, C. Hallacy, C. Koch, C. Gibson, C. Kim, C. Choi, C. McLeavey, C. Hesse, C. Fischer, C. Winter, C. Czarnecki, C. Jarvis, C. Wei, C. Koumouzelis, D. Sherburn, D. Kappler, D. Levin, D. Levy, D. Carr, D. Farhi, D. Mely, D. Robinson, D. Sasaki, D. Jin, D. Valladares, D. Tsipras, D. Li, D. P. Nguyen, D. Findlay, E. Oiwoh, E. Wong, E. Asdar, E. Proehl, E. Yang, E. Antonow, E. Kramer, E. Peterson, E. Sigler, E. Wallace, E. Brevdo, E. Mays, F. Khorasani, F. P. Such, F. Raso, F. Zhang, F. von Lohmann, F. Sulit, G. Goh, G. Oden, G. Salmon, G. Starace, G. Brockman, H. Salman, H. Bao, H. Hu, H. Wong, H. Wang, H. Schmidt, H. Whitney, H. Jun, H. Kirchner, H. P. de Oliveira Pinto, H. Ren, H. Chang, H. W. Chung, I. Kivlichan, I. O’Connell, I. O’Connell, I. Osband, I. Silber, I. Sohl, I. Okuyucu, I. Lan, I. Kostrikov, I. Sutskever, I. Kanitscheider, I. Gulrajani, J. Coxon, J. Menick, J. Pachocki, J. Aung, J. Betker, J. Crooks, J. Lennon, J. Kiros, J. Leike, J. Park, J. Kwon, J. Phang, J. Teplitz, J. Wei, J. Wolfe, J. Chen, J. Harris, J. Varavva, J. G. Lee, J. Shieh, J. Lin, J. Yu, J. Weng, J. Tang, J. Yu, J. Jang, J. Q. Candela, J. Beutler, J. Landers, J. Parish, J. Heidecke, J. Schulman, J. Lachman, J. McKay, J. Uesato, J. Ward, J. W. Kim, J. Huizinga, J. Sitkin, J. Kraaijeveld, J. Gross, J. Kaplan, J. Snyder, J. Achiam, J. Jiao, J. Lee, J. Zhuang, J. Harriman, K. Fricke, K. Hayashi, K. Singhal, K. Shi, K. Karthik, K. Wood, K. Rimbach, K. Hsu, K. Nguyen, K. Gu-Lemberg, K. Button, K. Liu, K. Howe, K. Muthukumar, K. Luther, L. Ahmad, L. Kai, L. Itow, L. Workman, L. Pathak, L. Chen, L. Jing, L. Guy, L. Fedus, L. Zhou, L. Mamitsuka, L. Weng, L. McCallum, L. Held, L. Ouyang, L. Feuvrier, L. Zhang, L. Kondraciuk, L. Kaiser, L. Hewitt, L. Metz, L. Doshi, M. Aflak, M. Simens, M. Boyd, M. Thompson, M. Dukhan, M. Chen, M. Gray, M. Hudnall, M. Zhang, M. Aljubeh, M. Litwin, M. Zeng, M. Johnson, M. Shetty, M. Gupta, M. Shah, M. Yatbaz, M. J. Yang, M. Zhong, M. Glaese, M. Chen, M. Janner, M. Lampe, M. Petrov, M. Wu, M. Wang, M. Fradin, M. Pokrass, M. Castro, M. O. T. de Castro, M. Pavlov, M. Brundage, M. Wang, M. Khan, M. Murati, M. Bavarian, M. Lin, M. Yesildal, N. Soto, N. Gimelshein, N. Cone, N. Staudacher, N. Summers, N. LaFontaine, N. Chowdhury, N. Ryder, N. Stathas, N. Turley, N. Tezak, N. Felix, N. Kudige, N. Keskar, N. Deutsch, N. Bundick, N. Puckett, O. Nachum, O. Okelola, O. Boiko, O. Murk, O. Jaffe, O. Watkins, O. Godement, O. Campbell-Moore, P. Chao, P. McMillan, P. Belov, P. Su, P. Bak, P. Bakkum, P. Deng, P. Dolan, P. Hoeschele, P. Welinder, P. Tillet, P. Pronin, P. Tillet, P. Dhariwal, Q. Yuan, R. Dias, R. Lim, R. Arora, R. Troll, R. Lin, R. G. Lopes, R. Puri, R. Miyara, R. Leike, R. Gaubert, R. Zamani, R. Wang, R. Donnelly, R. Honsby, R. Smith, R. Sahai, R. Ramchandani, R. Huet, R. Carmichael, R. Zellers, R. Chen, R. Chen, R. Nigmatullin, R. Cheu, S. Jain, S. Altman, S. Schoenholz, S. Toizer, S. Miserendino, S. Agarwal, S. Culver, S. Ethersmith, S. Gray, S. Grove, S. Metzger, S. Hermani, S. Jain, S. Zhao, S. Wu, S. Jomoto, S. Wu, Shuaiqi, Xia, S. Phene, S. Papay, S. Narayanan, S. Coffey, S. Lee, S. Hall, S. Balaji, T. Broda, T. Stramer, T. Xu, T. Gogineni, T. Christianson, T. Sanders, T. Patwardhan, T. Cunninghman, T. Degry, T. Dimson, T. Raoux, T. Shadwell, T. Zheng, T. Underwood, T. Markov, T. Sherbakov, T. Rubin, T. Stasi, T. Kaftan, T. Heywood, T. Peterson, T. Walters, T. Eloundou, V. Qi, V. Moeller, V. Monaco, V. Kuo, V. Fomenko, W. Chang, W. Zheng, W. Zhou, W. Manassra, W. Sheu, W. Zaremba, Y. Patil, Y. Qian, Y. Kim, Y. Cheng, Y. Zhang, Y. He, Y. Zhang, Y. Jin, Y. Dai, and Y. Malkov. Gpt-4o system card. 10 2024. URL https://arxiv.org/pdf/2410.21276. | ||
1213 | 25 | 37 | B. Waber, M. Williams, J. S. Carroll, and A. S. Pentland, “A voice is worth a thousand words: The implications of the micro-coding of social signals in speech for trust research,” in Handbook of Research Methods on Trust (G. M. Fergus Lyon and M. N. Saunders, eds.), ch. 23, p. 320, New York: Edward Elgar Publishing, 2011. | ||
562 | 11 | 20 | B. Y. Lin. ZeroEval: A Unified Framework for Evaluating Language Models, July 2024. URL https://github.com/WildEval/ZeroEval. | ||
1055 | 23 | 1 | Babak Alipanahi, Andrew Delong, Matthew T Weirauch, and Brendan J Frey. Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning. Nature Biotechnology, 33(8):831–838, August 2015. ISSN 1087-0156, 1546-1696. doi: 10.1038/nbt.3300. URL https://www.nature.com/articles/nbt.3300. | ||
994 | 22 | 77 | Babjac AN, Lu Z, Emrich SJ. CodonBERT: Using BERT for sentiment analysis to better predict genes with low expression. In: Wang MD, Byung-Jun Yoon P, (eds.), Proceedings of the 14th ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery, New York, NY, United States, 2023; 1–6. | ||
648 | 13 | 3 | Baghbanzadeh, N., Fallahpour, A., Parhizkar, Y., Ogidi, F., Roy, S., Ashkezari, S., Khazaie, V. R., Colacci, M., Etemad, A., Afkanpour, A., and Dolatabadi, E. Advancing medical representation learning through high-quality data, 2025. | ||
649 | 13 | 4 | Bahl, S., Ramzan, T., and Maraj, R. Interpretation and documentation of chest x-rays in the acute medical unit. Clinical Medicine, 20(2):s73, 2020. | ||
71 | 3 | 63 | Bai, Z., Zhang, Y.-z., Miyano, S., Yamaguchi, R., Fujimoto, K., Uematsu, S., and Imoto, S. (2022). Identification of bacteriophage genome sequences with representation learning. Bioinformatics. Btac509. | ||
1741 | 35 | 14 | Baichuan. 2023. Baichuan 2: Open Large-scale Language Models. arXiv preprint arXiv:2309.10305 (2023). | ||
929 | 22 | 12 | Baker B, Akkaya I, Zhokov P. et al. Video pretraining (vpt): learning to act by watching unlabeled online videos. Adv Neural Inf Process Syst 2022;35:24639–54. | ||
650 | 13 | 5 | Baltruschat, I., Steinmeister, L., Nickisch, H., Saalbach, A., Grass, M., Adam, G., Knopp, T., and Ittrich, H. Smart chest x-ray worklist prioritization using artificial intelligence: a clinical workflow simulation. European radiology, 31:3837–3845, 2021. | ||
1423 | 30 | 6 | Bamford S., Dawson E., Forbes S., Clements J., Pettett R., Dogan A., Flanagan A., Teague J., Futreal P.A., Stratton M.R.et al. . The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. Br. J. Cancer. 2004; 91:355–358. | ||
651 | 13 | 6 | Bannur, S., Bouzid, K., Castro, D. C., Schwaighofer, A., Thieme, A., Bond-Taylor, S., Ilse, M., P´ erez-Garc´ ıa, F., Salvatelli, V., Sharma, H., Meissen, F., Ranjit, M., Srivastav, S., Gong, J., Codella, N. C. F., Falck, F., Oktay, O., Lungren, M. P., Wetscherek, M. T., Alvarez-Valle, J., and Hyland, S. L. Maira-2: Grounded radiology report generation, 2024. | ||
652 | 13 | 7 | Bansal, H., Israel, D., Zhao, S., Li, S., Nguyen, T., and Grover, A. Medmax: Mixed-modal instruction tuning for training biomedical assistants, 2024. | ||
2117 | 35 | 390 | Barbara Zdrazil, Eloy Felix, Fiona Hunter, Emma J Manners, James Blackshaw, Sybilla Corbett, Marleen de Veij, Harris Ioannidis, David Mendez Lopez, Juan F Mosquera, et al. 2023. The ChEMBL Database in 2023: a drug discovery platform spanning multiple bioactivity data types and time periods. Nucleic Acids Research (2023), gkad1004. | ||
2038 | 35 | 311 | Baris E Suzek, Hongzhan Huang, Peter McGarvey, Raja Mazumder, and Cathy H Wu. 2007. UniRef: comprehensive and non-redundant UniProt reference clusters. Bioinformatics 23, 10 (2007), 1282–1288. | ||
2039 | 35 | 312 | Baris E Suzek, Yuqi Wang, Hongzhan Huang, Peter B McGarvey, Cathy H Wu, and UniProt Consortium. 2015. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics 31, 6 (2015), 926–932. | ||
1340 | 26 | 103 | Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. ST-MoE: Designing stable and transferable sparse expert models. CoRR, abs/2202.08906, 2022. | ||
1726 | 34 | 101 | Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. ST-MoE: Designing stable and transferable sparse expert models. CoRR, abs/2202.08906, 2022. | ||
632 | 12 | 49 | Barrington, C. L. et al. Synonymous codon usage regulates translation initiation. Cell Rep. 42, 113413 (2023). | ||
366 | 7 | 54 | Barroso, E. et al. Identification of the fourth duplication of upstream IHH regulatory elements, in a family with craniosynostosis Philadelphia type, helps to define the phenotypic characterization of these regulatory elements. Am. J. Med. Genet. A 167A, 902–906 (2015). | ||
342 | 7 | 30 | Bartel, D. P. Metazoan microRNAs. Cell 173, 20–51 (2018). | ||
12 | 3 | 4 | Bateman, A., Martin, M.-J., Orchard, S., Magrane, M., Ahmad, S., Alpi, E., Bowler-Barnett, E. H., Britto, R., Cukura, A., Denny, P. et al. (2023). UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Research 51, D523–D531. | ||
752 | 15 | 16 | Behrooz Ghorbani, Song Mei, Theodor Misiakiewicz, and Andrea Montanari. When do neural networks outperform kernel methods? arXiv preprint arXiv:2006.13409, 2020. | ||
1820 | 35 | 93 | Benedek Fabian, Thomas Edlich, Héléna Gaspar, Marwin Segler, Joshua Meyers, Marco Fiscato, and Mohamed Ahmed. 2020. Molecular representation learning with language models and domain-relevant auxiliary tasks. arXiv preprint arXiv:2011.13230 (2020). | ||
25 | 3 | 17 | Benegas, G., Albors, C., Aw, A. J., Ye, C., and Song, Y. S. (2023). GPN-MSA: an alignment- based DNA language model for genome-wide variant effect prediction. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2023.10.10.561776v2. | ||
408 | 8 | 9 | Benegas, G., Albors, C., Aw, A.J. et al. A DNA language model based on multispecies alignment predicts the effects of genome-wide variants. Nat Biotechnol. https://doi.org/10.1038/s41587-024-02511-w (2025). | ||
148 | 4 | 8 | Benegas, G., Batra, S. S. & Song, Y. S. DNA language models are powerful predictors of genome-wide variant effects. Proc. Natl Acad. Sci. USA 120, e2311219120 (2023). | ||
459 | 9 | 37 | Benegas, G., Batra, S. S. & Song, Y. S. DNA language models are powerful zero-shot predictors of non-coding variant effects. Proc. Natl Acad. Sci. USA 120, e2311219120 (2023). | ||
519 | 10 | 37 | Benegas, G., Batra, S. S. & Song, Y. S. DNA language models are powerful zero-shot predictors of non-coding variant effects. Proc. Natl Acad. Sci. USA 120, e2311219120 (2023). | ||
21 | 3 | 13 | Benegas, G., Batra, S. S., and Song, Y. S. (2023). DNA language models are powerful predictors of genome-wide variant effects. Proceedings of the National Academy of Sciences 120,e2311219120. | ||
1079 | 23 | 25 | Benjamin C. Hitz, Jin-Wook Lee, Otto Jolanki, Meenakshi S. Kagda, Keenan Graham, Paul Sud, Idan Gabdank, J. Seth Strattan, Cricket A. Sloan, Timothy Dreszer, Laurence D. Rowe, Nikhil R. Podduturi, Venkat S. Malladi, Esther T. Chan, Jean M. Davidson, Marcus Ho, Stuart Miyasato, Matt Simison, Forrest Tanaka, Yunhai Luo, Ian Whaling, Eurie L. Hong, Brian T. Lee, Richard Sandstrom, Eric Rynes, Jemma Nelson, Andrew Nishida, Alyssa Ingersoll, Michael Buckley, Mark Frerker, Daniel S Kim, Nathan Boley, Diane Trout, Alex Dobin, Sorena Rahmanian, Dana Wyman, Gabriela Balderrama-Gutierrez, Fairlie Reese, Neva C. Durand, Olga Dudchenko, David Weisz, Suhas S. P. Rao, Alyssa Blackburn, Dimos Gkountaroulis, Mahdi Sadr, Moshe Olshansky, Yossi Eliaz, Dat Nguyen, Ivan Bochkov, Muhammad Saad Shamim, Ragini Mahajan, Erez Aiden, Tom Gingeras, Simon Heath, Martin Hirst, W. James Kent, Anshul Kundaje, Ali Mortazavi, Barbara Wold, and J. Michael Cherry. The ENCODE Uniform Analysis Pipelines. preprint, Bioinformatics, April 2023. URL http://biorxiv.org/lookup/doi/10.1101/2023.04.04.535623. | ||
1088 | 23 | 34 | Benjamin Levy, Zihao Xu, Liyang Zhao, Karl Kremling, Ross Altman, Phoebe Wong, and Chris Tanner. FloraBERT: cross-species transfer learning withattention-based neural networks for geneexpression prediction. preprint, In Review, August 2022. URL https://www.researchsquare.com/article/rs-1927200/v1. | ||
1470 | 31 | 21 | Benyou Wang, Donghao Zhao, Christina Lioma, Qiuchi Li, Peng Zhang, and Jakob Grue Simonsen. Encoding word order in complex embeddings. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=Hke-WTVtwr. | ||
16 | 3 | 8 | Bepler, T., and Berger, B. (2021). Learning the protein language: Evolution, structure, and function. Cell Systems 12, 654–669. | ||
1505 | 32 | 12 | Bera, S., Dent, J., Gill, G., Stolman, A. & Wu, B. SimGCN for TDC Benchmarks (2022). | ||
476 | 9 | 54 | Bergstra, J., Bardenet, R., Bengio, Y. & Kégl, B. Algorithms for hyper-parameter optimization. in Advances in Neural Information Processing Systems 24 https://papers.nips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf (NeurIPS, 2011). | ||
536 | 10 | 54 | Bergstra, J., Bardenet, R., Bengio, Y. & Kégl, B. Algorithms for hyper-parameter optimization. in Advances in Neural Information Processing Systems 24 https://papers.nips.cc/paper_files/paper/2011/file/86e8f7ab32cfd12577bc2619bc635690-Paper.pdf (NeurIPS, 2011). | ||
397 | 7 | 85 | Bergstrom, A. et al. Insights into human genetic variation and population history from 929 diverse genomes. Science 367, eaay5012 (2020). | ||
963 | 22 | 46 | Bernstein NJ, Fong NL, Lam I. et al. Solo: doublet identification in single-cell RNA-seq via semi-supervised deep learning. Cell Syst 2020;11:95–101.e5e5. 10.1016/j.cels.2020.05.010. | ||
880 | 21 | 2 | Bhasuran B. , Natarajan J. (2018) Automatic extraction of gene-disease associations from literature using joint ensemble learning. PLoS One, 13, e0200699. | ||
1872 | 35 | 145 | Bijay Jassal, Lisa Matthews, Guilherme Viteri, Chuqiao Gong, Pascual Lorente, Antonio Fabregat, Konstantinos Sidiropoulos, Justin Cook, Marc Gillespie, Robin Haw, et al. 2020. The reactome pathway knowledgebase. Nucleic acids research 48, D1 (2020), D498–D503. | ||
2034 | 35 | 307 | Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, and Ji-Rong Wen. 2022. A molecular multimodal foundation model associating molecule graphs with natural language. arXiv preprint arXiv:2209.05481 (2022). | ||
1321 | 26 | 84 | Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, et al. Secrets of RLHF in large language models part II: Reward modeling. CoRR, abs/2401.06080, 2024a. | ||
1707 | 34 | 82 | Binghai Wang, Rui Zheng, Lu Chen, Yan Liu, Shihan Dou, Caishuang Huang, Wei Shen, Senjie Jin, Enyu Zhou, Chenyu Shi, et al. Secrets of RLHF in large language models part II: Reward modeling. CoRR, abs/2401.06080, 2024a. | ||
1278 | 26 | 41 | Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. Qwen2.5-Coder technical report. CoRR, abs/2409.12186, 2024. | ||
1665 | 34 | 40 | Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Keming Lu, et al. Qwen2.5-Coder technical report. CoRR, abs/2409.12186, 2024. | ||
635 | 12 | 52 | Bio.Data.CodonTable module—Biopython 1.75 documentation. https://biopython.org/docs/1.75/api/Bio.Data.CodonTable.html. | ||
1751 | 35 | 24 | Bishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong | ||
370 | 7 | 58 | Blake, J. A. et al. The Mouse Genome Database (MGD): premier model organism resource for mammalian genomics and genetics. Nucleic Acids Res. 39, D842–D848 (2011). | ||
111 | 3 | 103 | Blanchette, M., Kent, W. J., Riemer, C., Elnitski, L., Smit, A. F., Roskin, K. M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E. D. et al. (2004). Aligning multiple genomic sequences with the threaded blockset aligner. Genome Research 14, 708–715. | ||
415 | 8 | 16 | Bloomfield, D. et al. AI and biosecurity: the need for governance. Science 385, 831–833 (2024). | ||
1239 | 26 | 2 | Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick LeGresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Peters Long, Ameya Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason D. Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, and Chen Zhu. Nemotron-4 340B technical report. CoRR, abs/2406.11704, 2024. | ||
1627 | 34 | 2 | Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick LeGresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Peters Long, Ameya Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason D. Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, and Chen Zhu. Nemotron-4 340B technical report. CoRR, abs/2406.11704, 2024. | ||
1770 | 35 | 43 | Bo Chen, Xingyi Cheng, Yangli-ao Geng, Shen Li, Xin Zeng, Boyan Wang, Jing Gong, Chiming Liu, Aohan Zeng, Yuxiao Dong, et al. 2023. xTrimoPGLM: Unified 100b-scale pre-trained transformer for deciphering the language of protein. bioRxiv (2023), 2023–07. | ||
753 | 15 | 17 | Bogdan Gliwa, Iwona Mochol, Maciej Biesek, and Aleksander Wawer. Samsum corpus: A human-annotated dialogue dataset for abstractive summarization. CoRR, abs/1911.12237, 2019. URL http://arxiv.org/abs/1911.12237. | ||
1000 | 22 | 83 | Bolton E, Hall D, Yasunaga M. et al. Stanford crfm introduces pubmedgpt 2.7 b. 2022. | ||
919 | 22 | 2 | Bommasani DA, Hudson E, Adeli E. et al. On the opportunities and risks of foundation models.arXiv preprint arXiv:2108.07258, 2021. | ||
53 | 3 | 45 | Bommasani, R., Hudson, D. A. et al. (2021). On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258. https://arxiv.org/abs/2108.07258. | ||
1542 | 32 | 49 | Boral, N., Ghosh, P., Goswami, A. & Bhattacharyya, M. Accountable prediction of drug ADMET Properties with molecular descriptors. bioRxiv, 2022–06 (2022). | ||
816 | 17 | 2 | Boratyn GM, Camacho C, Cooper PS et al. Blast: a more efficient report with usability improvements. Nucleic Acids Res 2013;41:W29–W33. | ||
817 | 17 | 3 | Borgeaud S, Mensch A, Hoffmann J et al. Improving language models by retrieving from trillions of tokens. In: International conference on machine learning, Baltimore, Maryland, USA, p. 2206–40. PMLR, 2022. | ||
181 | 4 | 41 | Borgeaud, S. et al. Improving language models by retrieving from trillions of tokens. In Proc. 39th International Conference on Machine Learning (eds Chaudhuri, K. et al.) 2206–2240 (PMLR, 2022). | ||
2114 | 35 | 387 | Botao Yu, Frazier N Baker, Ziqi Chen, Xia Ning, and Huan Sun. 2024. Llasmol: Advancing large language models for chemistry with a large-scale, comprehensive, high-quality instruction tuning dataset. arXiv preprint arXiv:2402.09391 (2024). | ||
1835 | 35 | 108 | Bowen Gao, Bo Qiang, Haichuan Tan, Minsi Ren, Yinjun Jia, Minsi Lu, Jingjing Liu, Weiying Ma, and Yanyan Lan. 2023. DrugCLIP: Contrastive Protein-Molecule Representation Learning for Virtual Screening. arXiv:2310.06367 [cs.LG] | ||
1301 | 26 | 64 | Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and Enrico Shippole. YaRN: Efficient context window extension of large language models. CoRR, abs/2309.00071, 2023. | ||
1598 | 33 | 54 | Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and Enrico Shippole. YaRN: Efficient context window extension of large language models. CoRR, abs/2309.00071, 2023. | ||
1687 | 34 | 62 | Bowen Peng, Jeffrey Quesnelle, Honglu Fan, and Enrico Shippole. YaRN: Efficient context window extension of large language models. CoRR, abs/2309.00071, 2023. | ||
1250 | 26 | 13 | Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, and Bowen Yu. Towards scalable automated alignment of LLMs: A survey. CoRR, abs/2406.01252, 2024. | ||
1555 | 33 | 11 | Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, and Bowen Yu. Towards scalable automated alignment of LLMs: A survey. CoRR, abs/2406.01252, 2024. | ||
1638 | 34 | 13 | Boxi Cao, Keming Lu, Xinyu Lu, Jiawei Chen, Mengjie Ren, Hao Xiang, Peilin Liu, Yaojie Lu, Ben He, Xianpei Han, Le Sun, Hongyu Lin, and Bowen Yu. Towards scalable automated alignment of LLMs: A survey. CoRR, abs/2406.01252, 2024. | ||
1862 | 35 | 135 | Bozhen Hu, Jun Xia, Jiangbin Zheng, Cheng Tan, Yufei Huang, Yongjie Xu, and Stan Z Li. 2022. Protein language models and structure prediction: Connection and progression. arXiv preprint arXiv:2211.1674(2022). | ||
1797 | 35 | 70 | Bradley E. 7 34 Birney Ewan Dunham Ian Green Eric D. 35 Gunter Chris 15 Snyder Michael 13, et al. 2012. An integrated encyclopedia of DNA | ||
944 | 22 | 27 | Brandes N, Ofer D, Peleg Y. et al. Proteinbert: a universal deep-learning model of protein sequence and function. Bioinformatics 2022;38:2102–10. 10.1093/bioinformatics/btac020. | ||
146 | 4 | 6 | Brandes, N., Goldman, G., Wang, C. H., Ye, C. J. & Ntranos, V. Genome-wide prediction of disease variant effects with a deep protein language model. Nat. Genet. 55, 1512–1522 (2023). | ||
20 | 3 | 12 | Brandes, N., Goldman, G., Wang, C. H., Ye, C. J., and Ntranos, V. (2023). Genome-wide prediction of disease variant effects with a deep protein language model. Nature Genetics. https://doi.org/10.1038/s41588-023-01465-0. doi:10.1038/s41588-023-01465-0. | ||
461 | 9 | 39 | Braun, S. et al. Decoding a cancer-relevant splicing decision in the ron proto-oncogene using high-throughput mutagenesis. Nat. Commun. 9, 3315 (2018). | ||
521 | 10 | 39 | Braun, S. et al. Decoding a cancer-relevant splicing decision in the ron proto-oncogene using high-throughput mutagenesis. Nat. Commun. 9, 3315 (2018). | ||
881 | 21 | 3 | Bravo À. et al. (2015) Extraction of relations between genes and diseases from text and large-scale data analysis: implications for translational research. BMC Bioinformatics, 16, 55. | ||
734 | 14 | 33 | Breiman, L. Random Forests. Machine Learning 45, 5–32 (2001). | ||
964 | 22 | 47 | Brendel M, Su C, Bai Z. et al. Application of deep learning on single-cell RNA sequencing data analysis: a review. Genomics Proteomics Bioinformatics 2022;20:814–35. 10.1016/j.gpb.2022.11.011. | ||
763 | 15 | 27 | Brian Lester, Rami Al-Rfou, and Noah Constant. The Power of Scale for Parameter-Efficient Prompt Tuning. arXiv:2104.08691 [cs], April 2021. URL http://arxiv.org/abs/2104.08691. arXiv: 2104.08691. | ||
400 | 8 | 1 | Brixi, G. et al. Genome modeling and design across all domains of life with Evo 2. 2025.02.18.638918. Preprint at https://doi.org/10.1101/2025.02.18.638918 (2025). | ||
1416 | 29 | 7 | Brookes,A.J., Lehväslaiho,H., Siegfried,M., Boehm,J.G., Yuan,Y.P., Sarkar,C.M., Bork,P. and Ortigao,F. (2000) HGBASE A Database of SNPs and other variations in and around human genes. Nucleic Acids Res., 28, 356–360. | ||
818 | 17 | 4 | Brown T, Mann B, Ryder N et al. Language models are few-shot learners. Advances in Neural Information Processing Systems 2020;33:1877–901. | ||
985 | 22 | 68 | Brown TB, Mann B, Ryder N. et al. Language models are few-shot learners. Adv Neural Inf Process Syst 2020;33:1877–901. | ||
611 | 12 | 28 | Brown, T. B. et al. Language models are few-shot learners. https://doi.org/10.48550/ARXIV.2005.14165 (2020). | ||
424 | 9 | 2 | Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020). | ||
484 | 10 | 2 | Brown, T. et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901 (2020). | ||
33 | 3 | 25 | Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A., Sutskever, I., and Amodei, D. Language models are few-shot learners. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., eds. Advances in Neural Information Processing Systems vol. 33. Curran Associates, Inc. (2020):( 1877–1901). https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf. | ||
593 | 12 | 10 | Brule, C. E. & Grayhack, E. J. Synonymous codons: choose wisely for expression. Trends Genet 33, 283–297 (2017). | ||
392 | 7 | 80 | Budescu, D. V. Dominance analysis: a new approach to the problem of relative importance of predictors in multiple regression. Psych. Bull. 114, 542 (1993). | ||
1414 | 29 | 5 | Buetow,K.H., Edmonson,M.N. and Cassidy,A.B. (1999) Reliable identification of large numbers of candidate SNPs from public EST data. Nature Genet., 21, 323–325. | ||
477 | 9 | 55 | Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell 185, 3426–3440 (2022). | ||
537 | 10 | 55 | Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios. Cell 185, 3426–3440 (2022). | ||
1808 | 35 | 81 | C Domínguez Conde, C Xu, LB Jarvis, DB Rainbow, SB Wells, T Gomes, SK Howlett, O Suchanek, K Polanski, HW King, et al. 2022. Cross-tissue immune cell analysis reveals tissue-specific features in humans. Science 376, 6594 (2022), eabl5197. | ||
188 | 5 | 2 | C. D. Allis and T. Jenuwein. The molecular hallmarks of epigenetic control. Nature Reviews Genetics, 17(8):487–500, 2016. | ||
211 | 5 | 24 | C. Darwin. On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. John Murray, London, 1859. | ||
1145 | 24 | 28 | C. Li, M. Zhang, and Y. He. The stability-efficiency dilemma: Investigating sequence length warmup for training GPT models. In Advances in Neural Information Processing Systems, 2022. | ||
580 | 11 | 38 | C. S. Xia, Y. Deng, S. Dunn, and L. Zhang. Agentless: Demystifying llm-based software engineering agents. arXiv preprint, 2024. | ||
574 | 11 | 32 | C. Snell, J. Lee, K. Xu, and A. Kumar. Scaling llm test-time compute optimally can be more effective than scaling model parameters, 2024. URL https://arxiv.org/abs/2408.03314. | ||
401 | 8 | 2 | Callaway, E. Biggest-ever AI biology model writes DNA on demand. Nature 638, 868–869 (2025). | ||
2095 | 35 | 368 | Canwen Xu, Daya Guo, Nan Duan, and Julian McAuley. 2023. Baize: An open-source chat model with parameter-efficient tuning on self-chat data. arXiv preprint arXiv:2304.0119(2023). | ||
1035 | 22 | 118 | Cao ZJ, Gao G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat Biotechnol 2022;40:1458–66. 10.1038/s41587-022-01284-4. | ||
1815 | 35 | 88 | Carl Edwards, ChengXiang Zhai, and Heng Ji. 2021. Text2mol: Cross-modal molecule retrieval with natural language queries. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 595–607. | ||
1813 | 35 | 86 | Carl Edwards, Tuan Lai, Kevin Ros, Garrett Honke, Kyunghyun Cho, and Heng Ji. 2022. Translation between Molecules and Natural Language. arXiv:2204.11817 [cs.CL] | ||
1978 | 35 | 251 | Carlos Outeiral and Charlotte M Deane. 2024. Codon language embeddings provide strong signals for use in protein engineering. Nature Machine Intelligence 6, 2 (2024), 170–179. | ||
1411 | 29 | 2 | Carulli,J.P., Artinger,M., Swain,P.M., Root,C.D., Chee,L., Tulig,C., Guerin,J., Osborne,M., Stein,G., Lian,J. and Lomedico,P.T. (1998) High throughput analysis of differential gene expression. J. Cell. Biochem., 30–31 (Suppl.), 286–296. | ||
811 | 16 | 12 | Castelli FM. KEGGutils v04.1. Computer software. 2022. Zenodo. https://doi.org/10.5281/zenodo.7482523. | ||
2084 | 35 | 357 | Cathy H Wu, Anastasia Nikolskaya, Hongzhan Huang, Lai-Su L Yeh, Darren A Natale, Cholanayakanahalli R Vinayaka, Zhang-Zhi Hu, Raja Mazumder, Sandeep Kumar, Panagiotis Kourtesis, et al. 2004. PIRSF: family classification system at the Protein Information Resource. Nucleic acids research 32, suppl_1 (2004), D112–D114. | ||
1412 | 29 | 3 | Cavalli-Sforza,L.L. (1998) The DNA revolution in population genetics. Trends Genet., 14, 60–65. | ||
653 | 13 | 8 | Chambon, P., Bluethgen, C., Delbrouck, J.-B., der Sluijs, R. V., Połacin, M., Chaves, J. M. Z., Abraham, T. M., Purohit, S., Langlotz, C. P., and Chaudhari, A. Roentgen: Vision-language foundation model for chest x-ray generation, 2022. | ||
654 | 13 | 9 | Chambon, P., Delbrouck, J.-B., Sounack, T., Huang, S.-C., Chen, Z., Varma, M., Truong, S. Q., Chuong, C. T., and Langlotz, C. P. Chexpert plus: Augmenting a large chest x-ray dataset with text radiology reports, patient demographics and additional image formats, 2024. | ||
615 | 12 | 32 | Chandra, S. et al. The high mutational sensitivity of ccda antitoxin is linked to codon optimality. Mol Biol Evol 39, (2022). | ||
1418 | 30 | 1 | Chang K., Creighton C.J., Davis C., Donehower L., Drummond J., Wheeler D., Ally A., Balasundaram M., Birol I., Butterfield Y.S.N.et al. . The cancer genome atlas pan-cancer analysis project. Nat. Genet. 2013; 45:1113–1120. | ||
1949 | 35 | 222 | Chang Ma, Haiteng Zhao, Lin Zheng, Jiayi Xin, Qintong Li, Lijun Wu, Zhihong Deng, Yang Lu, Qi Liu, and Lingpeng Kong. 2023. Retrieved Sequence Augmentation for Protein Representation Learning. bioRxiv (2023), 2023–02. | ||
1322 | 26 | 85 | Changhan Wang, Kyunghyun Cho, and Jiatao Gu. Neural machine translation with byte-level subwords. In AAAI, pp. 9154–9160. AAAI Press, 2020. | ||
1708 | 34 | 83 | Changhan Wang, Kyunghyun Cho, and Jiatao Gu. Neural machine translation with byte-level subwords. In AAAI, pp. 9154–9160. AAAI Press, 2020. | ||
2115 | 35 | 388 | Chaohao Yuan, Songyou Li, Geyan Ye, Yikun Zhang, Long-Kai Huang, Wenbing Huang, Wei Liu, Jianhua Yao, and Yu Rong. 2024. Functional Protein Design with Local Domain Alignment. arXiv preprint arXiv:2404.16866 (2024). | ||
1492 | 31 | 43 | Chaojun Xiao, Haoxi Zhong, Zhipeng Guo, Cunchao Tu, Zhiyuan Liu, Maosong Sun, Tianyang Zhang, Xianpei Han, Zhen hu, Heng Wang, and Jianfeng Xu. Cail2019-scm: A dataset of similar case matching in legal domain. 11 2019. | ||
2083 | 35 | 356 | Chaoyi Wu, Weixiong Lin, Xiaoman Zhang, Ya Zhang, Yanfeng Wang, and Weidi Xie. 2023. PMC-LLaMA: Towards Building Open-source Language Models for Medicine. arXiv:2304.14454 [cs.CL] | ||
1294 | 26 | 57 | characters: Attaining arbitrary role-play via self-alignment. CoRR, abs/2401.12474, 2024b. | ||
1073 | 23 | 19 | Charles P. Fulco, Joseph Nasser, Thouis R. Jones, Glen Munson, Drew T. Bergman, Vidya Subramanian, Sharon R. Grossman, Rockwell Anyoha, Benjamin R. Doughty, Tejal A. Patwardhan, Tung H. Nguyen, Michael Kane, Elizabeth M. Perez, Neva C. Durand, Caleb A. Lareau, Elena K. Stamenova, Erez Lieberman Aiden, Eric S. Lander, and Jesse M. Engreitz. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nature Genetics, 51(12):1664–1669, December 2019. ISSN 1546-1718. doi: 10.1038/s41588-019-0538-0. URL https://www.nature.com/articles/s41588-019-0538-0. Number: 12 Publisher: Nature Publishing Group. | ||
1353 | 28 | 5 | ChatGLM3 Team. Chatglm3 series: Open bilingual chat llms, 2023. URL https://github.com/THUDM/ChatGLM3. M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, | ||
2049 | 35 | 322 | Chau Tran, Siddharth Khadkikar, and Aleksey Porollo. 2023. Survey of Protein Sequence Embedding Models. International Journal of Molecular Sciences 24, 4 (2023), 3775. | ||
954 | 22 | 37 | Chen B, Cheng X, Geng Y. et al. xTrimoPGLM: unified 100B-scale pre-trained transformer for deciphering the language of protein. arXiv preprint arXiv:2401.06199, 2024. | ||
975 | 22 | 58 | Chen J, Hu Z, Sun S. et al. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions. arXiv preprint arXiv:2204.00300, 2022. | ||
972 | 22 | 55 | Chen K, Zhao H, Yang Y. Capturing large genomic contexts for accurately predicting enhancer-promoter interactions. Brief Bioinform 2022; 23:bbab577. 10.1093/bib/bbab577. | ||
819 | 17 | 5 | Chen M, Tworek J, Jun H et al. Evaluating large language models trained on code. arXiv, arXiv:2107.03374, 2021, preprint: not peer reviewed. | ||
1497 | 32 | 4 | Chen, H., Si, Y., Wen, J., Hu, C., Xia, E., Wang, Y. & Wang, O. P110αinhibitor alpelisib exhibits a synergistic effect with pyrotinib and reverses pyrotinib resistant in HER2+ breast cancer. Neoplasia 43, 100913 (2023). | ||
1494 | 32 | 1 | Chen, J., Hu, Y., Wang, Y., Lu, Y., Cao, X., Lin, M., Xu, H., Wu, J., Xiao, C., Sun, J., et al. TrialBench: Multi-modal artificial intelligence-ready clinical trial datasets. arXiv preprint arXiv:2407.00631 (2024). | ||
440 | 9 | 18 | Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022). | ||
500 | 10 | 18 | Chen, K. M., Wong, A. K., Troyanskaya, O. G. & Zhou, J. A sequence-based global map of regulatory activity for deciphering human genetics. Nat. Genet. 54, 940–949 (2022). | ||
73 | 3 | 65 | Chen, K., Zhou, Y., Ding, M., Wang, Y., Ren, Z., and Yang, Y. (2024). Self-supervised learning on millions of primary RNA sequences from 72 vertebrates improves sequence-based RNA splicing prediction. Briefings in Bioinformatics 25, bbae163. | ||
705 | 14 | 4 | Chen, M. et al. Evaluating Large Language Models Trained on Code. Preprint at https://doi.org/10.48550/arXiv.2107.03374 (2021). | ||
163 | 4 | 23 | Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024). | ||
733 | 14 | 32 | Chen, T. & Guestrin, C. XGBoost: A Scalable Tree Boosting System. in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (Association for Computing Machinery, New York, NY, USA, 2016). doi:10.1145/2939672.2939785. | ||
1519 | 32 | 26 | Chen, X., Dougherty, T., Hong, C., Schibler, R., Zhao, Y. C., Sadeghi, R., Matasci, N., Wu, Y.-C. & Kerman, I. Predicting antibody developability from sequence using machine learning. biorxiv, 2020–06 (2020). | ||
655 | 13 | 10 | Chen, Z., Varma, M., Delbrouck, J.-B., Paschali, M., Blankemeier, L., Van Veen, D., Valanarasu, J. M. J., Youssef, A., Cohen, J. P., Reis, E. P., et al. Chexagent: Towards | ||
657 | 13 | 12 | Chen, Z., Varma, M., Xu, J., Paschali, M., Veen, D. V., Johnston, A., Youssef, A., Blankemeier, L., Bluethgen, C., Altmayer, S., Valanarasu, J. M. J., Muneer, M. S. E., Reis, E. P., Cohen, J. P., Olsen, C., Abraham, T. M., Tsai, E. B., Beaulieu, C. F., Jitsev, J., Gatidis, S., Delbrouck, J.-B., Chaudhari, A. S., and Langlotz, C. P. A vision-language foundation model to enhance efficiency of chest x-ray interpretation, 2024b. | ||
1987 | 35 | 260 | Cheng Peng, Xi Yang, Aokun Chen, Kaleb E. Smith, Nima PourNejatian, Anthony B. Costa, Cheryl Martin, Mona G. Flores, Ying Zhang, Tanja Magoc, Gloria Lipori, Duane A. Mitchell, Naykky S. Ospina, Mustafa M. Ahmed, William R. Hogan, Elizabeth A. Shenkman, Yi Guo, Jiang Bian, and Yonghui Wu. 2023. A study of generative large language model for medical research and healthcare. npj Digital Medicine 6, 1 (2023). https://doi.org/10.1038/s41746-023-00958-w | ||
1276 | 26 | 39 | Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. RULER: What’s the real context size of your long-context language models? CoRR, abs/2404.06654, 2024. | ||
1663 | 34 | 38 | Cheng-Ping Hsieh, Simeng Sun, Samuel Kriman, Shantanu Acharya, Dima Rekesh, Fei Jia, Yang Zhang, and Boris Ginsburg. RULER: What’s the real context size of your long-context language models? CoRR, abs/2404.06654, 2024. | ||
1461 | 31 | 12 | Cheng-Zhi Anna Huang, Ashish Vaswani, Jakob Uszkoreit, Noam Shazeer, I. Simon, C. Hawthorne, Andrew M. Dai, M. Hoffman, M. Dinculescu, and D. Eck. Music transformer. arXiv: Learning, 2018. | ||
107 | 3 | 99 | Cheng, X., Chen, B., Li, P., Gong, J., Tang, J., and Song, L. (2024). Training compute-optimal protein language models. bioRxiv preprint. https://www.biorxiv.org/content/ 10.1101/2024.06.06.597716v1. | ||
2113 | 35 | 386 | Chengxuan Ying, Mingqi Yang, Shuxin Zheng, Guolin Ke, Shengjie Luo, Tianle Cai, Chenglin Wu, Yuxin Wang, Yanming Shen, and Di He. 2021. First place solution of KDD Cup 2021 & OGB large-scale challenge graph prediction track. arXiv preprint arXiv:2106.08279 (2021). | ||
1812 | 35 | 85 | ChenRui Duan, Zelin Zang, Yongjie Xu, Hang He, Zihan Liu, Zijia Song, Ju-Sheng Zheng, and Stan Z. Li. 2024. FGBERT: Function-Driven Pre-trained Gene Language Model for Metagenomics. arXiv:2402.16901 [q-bio.GN] | ||
1242 | 26 | 5 | Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, and Lingpeng Kong. Training-free long-context scaling of large language models. CoRR, abs/2402.17463, 2024. | ||
1548 | 33 | 4 | Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, and Lingpeng Kong. Training-free long-context scaling of large language models. CoRR, abs/2402.17463, 2024. | ||
1630 | 34 | 5 | Chenxin An, Fei Huang, Jun Zhang, Shansan Gong, Xipeng Qiu, Chang Zhou, and Lingpeng Kong. Training-free long-context scaling of large language models. CoRR, abs/2402.17463, 2024. | ||
2195 | 36 | 35 | Chi Sun, Xipeng Qiu, Yige Xu, and Xuanjing Huang. How to fine-tune bert for text classification?, 2020. | ||
84 | 3 | 76 | Chiang, W.-L., Li, Z., Lin, Z., Sheng, Y., Wu, Z., Zhang, H., Zheng, L., Zhuang, S., Zhuang, Y., Gonzalez, J. E., Stoica, I., and Xing, E. P. Vicuna: An Open-Source Chat- bot Impressing GPT-4 with 90%* ChatGPT Quality (2023). https://lmsys.org/blog/2023-03-30-vicuna/. | ||
1920 | 35 | 193 | Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out. 74–81. | ||
609 | 12 | 26 | Cho, K., van Merrienboer, B., Bahdanau, D. & Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. https://doi.org/10.48550/ARXIV.1409.1259 (2014). | ||
1028 | 22 | 111 | Choromanski K, Likhosherstov V, Dohan D. et al. Rethinking attention with performers. arXiv preprint arXiv:2009.14794, 2020. | ||
1650 | 34 | 25 | Choudhary, Dhruv Mahajan, Diego Garcia-Olano, Diego Perino, Dieuwke Hupkes, Egor Lakomkin, Ehab AlBadawy, Elina Lobanova, Emily Dinan, Eric Michael Smith, Filip Radenovic, Frank Zhang, Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Graeme Nail, Gr´ egoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, Iliyan Zarov, Imanol Arrieta Ibarra, Isabel M. Kloumann, Ishan Misra, Ivan Evtimov, Jade Copet, Jaewon Lee, Jan Geffert, Jana Vranes, Jason Park, Jay Mahadeokar, Jeet Shah, Jelmer van der Linde, Jennifer Billock, Jenny Hong, Jenya Lee, Jeremy Fu, Jianfeng Chi, Jianyu Huang, Jiawen Liu, Jie Wang, Jiecao Yu, Joanna Bitton, Joe Spisak, Jongsoo Park, Joseph Rocca, Joshua Johnstun, Joshua Saxe, Junteng Jia, Kalyan Vasuden Alwala, Kartikeya Upasani, Kate Plawiak, Ke Li, Kenneth Heafield, Kevin Stone, and et al. The Llama 3 herd of models. CoRR, abs/2407.21783, 2024. | ||
820 | 17 | 6 | Chowdhery A, Narang S, Devlin J et al. Palm: scaling language modeling with pathways. arXiv, arXiv:2204.02311, 2022, preprint: not peer reviewed. | ||
466 | 9 | 44 | Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. Mach. Learn. Technol. 24, 11324–11436 (2021). | ||
526 | 10 | 44 | Chowdhery, A. et al. PaLM: scaling language modeling with pathways. J. Mach. Learn. Technol. 24, 11324–11436 (2021). | ||
1016 | 22 | 99 | Chowdhury R, Bouatta N, Biswas S. et al. Single-sequence protein structure prediction using a language model and deep learning. Nat Biotechnol 2022;40:1617–23. 10.1038/s41587-022-01432-w. | ||
1803 | 35 | 76 | Christian Dallago, Jody Mou, Kadina E Johnston, Bruce J Wittmann, Nicholas Bhattacharya, Samuel Goldman, Ali Madani, and Kevin K Yang. 2021. | ||
2025 | 35 | 298 | Christian JA Sigrist, Edouard De Castro, Lorenzo Cerutti, Béatrice A Cuche, Nicolas Hulo, Alan Bridge, Lydie Bougueleret, and Ioannis Xenarios. 2012. New and continuing developments at PROSITE. Nucleic acids research 41, D1 (2012), D344–D347. | ||
2197 | 36 | 37 | Christina V Theodoris, Ling Xiao, Anant Chopra, Mark D Chaffin, Zeina R Al Sayed, Matthew C Hill, Helene Mantineo, Elizabeth M Brydon, Zexian Zeng, X Shirley Liu, and Patrick T Ellinor. Transfer learning enables predictions in network biology. Nature, 618(7965):616–624, June 2023. | ||
1976 | 35 | 249 | Christine A Orengo, Alex D Michie, Susan Jones, David T Jones, Mark B Swindells, and Janet M Thornton. 1997. CATH–a hierarchic classification of protein domain structures. Structure 5, 8 (1997), 1093–1109. | ||
1057 | 23 | 3 | Christof Angermueller, Heather J. Lee, Wolf Reik, and Oliver Stegle. DeepCpG: accurate prediction of single-cell DNA methylation states using deep learning. Genome Biology, 18(1):67, December 2017. ISSN 1474-760X. doi: 10.1186/s13059-017-1189-z. URL http://genomebiology.biomedcentral.com/articles/10.1186/s13059-017-1189-z. | ||
806 | 16 | 7 | Christudas B. cURL and Postman. In: Practical Microservices Architectural Patterns: Event-Based Java Microservices with Spring Boot and Spring Cloud. Berkeley, CA: Apress. 2019;847–55. | ||
78 | 3 | 70 | Chu, Y., Yu, D., Li, Y., Huang, K., Shen, Y., Cong, L., Zhang, J., and Wang, M. (2024). A 5’ UTR language model for decoding untranslated regions of mRNA and function predictions. Nature Machine Intelligence 6, 449–460. | ||
1455 | 31 | 6 | Chulhee Yun, Srinadh Bhojanapalli, Ankit Singh Rawat, Sashank Reddi, and Sanjiv Kumar. Are transformers universal approximators of sequence-to-sequence functions? In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=ByxRM0Ntvr. | ||
764 | 15 | 28 | Chunyuan Li, Heerad Farkhoor, Rosanne Liu, and Jason Yosinski. Measuring the Intrinsic Dimension of Objective Landscapes. arXiv:1804.08838 [cs, stat], April 2018a. URL http://arxiv.org/abs/1804.08838. arXiv: 1804.08838. | ||
1036 | 22 | 119 | Ciciani M, Demozzi M, Pedrazzoli E. et al. Automated identification of sequence-tailored Cas9 proteins using massive metagenomic data. Nat Commun 2022;13:6474. 10.1038/s41467-022-34213-9. | ||
604 | 12 | 21 | Claassens, N. J. et al. Improving heterologous membrane protein production in Escherichia coli by combining transcriptional tuning and codon usage algorithms. PLoS ONE 12, e0184355 (2017). | ||
751 | 15 | 15 | Claire Gardent, Anastasia Shimorina, Shashi Narayan, and Laura Perez-Beltrachini. The webnlg challenge: Generating text from rdf data. In Proceedings of the 10th International Conference on Natural Language Generation, pp. 124–133, 2017. | ||
624 | 12 | 41 | Clarke, T. F. 4th & Clark, P. L. Rare codons cluster. PLoS ONE 3, e3412 (2008). | ||
967 | 22 | 50 | Clement L. Statistical methods for quantitative MS-based proteomics: part I. Preprocessing. | ||
812 | 16 | 13 | Cock PJA. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Computer software. PyPi; 2009. | ||
639 | 12 | 56 | Codon Usage Database. https://www.kazusa.or.jp/codon/. | ||
363 | 7 | 51 | Coe, B. P. et al. Refining analyses of copy number variation identifies specific genes associated with developmental delay. Nat. Genet. 46, 1063–1071 (2014). | ||
659 | 13 | 14 | Cohen, J. P., Viviano, J. D., Bertin, P., Morrison, P., Torabian, P., Guarrera, M., Lungren, M. P., Chaudhari, A., Brooks, R., Hashir, M., and Bertrand, H. TorchXRayVision: A library of chest X-ray datasets and models. In Medical Imaging with Deep Learning, 2022. | ||
1961 | 35 | 234 | Colin Megill, Bruce Martin, Charlotte Weaver, Sidney Bell, Lia Prins, Seve Badajoz, Brian McCandless, Angela Oliveira Pisco, Marcus Kinsella, Fiona Griffin, et al. 2021. Cellxgene: a performant, scalable exploration platform for high dimensional sparse matrices. bioRxiv (2021), 2021–04. | ||
1464 | 31 | 15 | Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, W. Li, and Peter J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21: 140:1–140:67, 2020. | ||
1995 | 35 | 268 | Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. 2020. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research 21, 1 (2020), 5485–5551. | ||
2188 | 36 | 28 | Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551, 2020. | ||
1325 | 26 | 88 | Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Benjamin Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, and Micah Goldblum. LiveBench: A challenging, contamination-free LLM benchmark. CoRR, abs/2406.19314, 2024. | ||
1711 | 34 | 86 | Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Benjamin Feuer, Siddhartha Jain, Ravid ShwartzZiv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, and Micah Goldblum. LiveBench: A challenging, contamination-free LLM benchmark. CoRR, abs/2406.19314, 2024. | ||
337 | 7 | 25 | Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020). | ||
1413 | 29 | 4 | Collins,F.S. (1999) Shattuck lecture–medical and societal consequences of the Human Genome Project. N. Engl. J. Med., 341, 28–37. | ||
407 | 8 | 8 | Comfort, N. Genetics: we are the 98%. Nature 520, 615–616 (2015). | ||
404 | 8 | 5 | Consens, M. E. et al. Transformers and genome language models. Nat. Machine Intell. https://www.nature.com/articles/s42256-025-01007-9 (2025). | ||
136 | 3 | 128 | Consortium, E. P. et al. (2012). An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74. | ||
450 | 9 | 28 | Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68 (2015). | ||
510 | 10 | 28 | Consortium, G. P. et al. A global reference for human genetic variation. Nature 526, 68 (2015). | ||
372 | 7 | 60 | Consortium, G. T. The Genotype–Tissue Expression (GTEx) project. Nat. Genet. 45, 580–585 (2013). | ||
464 | 9 | 42 | Consortium, G. The gtex consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020). | ||
524 | 10 | 42 | Consortium, G. The gtex consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020). | ||
607 | 12 | 24 | Constant, D. A. et al. Deep learning-based codon optimization with large-scale synonymous variant datasets enables generalized tunable protein expression. Preprint at bioRxiv https://doi.org/10.1101/2023.02.11.528149 (2023). | ||
1440 | 30 | 23 | Cooper G.M., Stone E.A., Asimenos G., Green E.D., Batzoglou S., Sidow A. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res. 2005; 15:901–913. | ||
364 | 7 | 52 | Cooper, G. M. et al. A copy number variation morbidity map of developmental delay. Nat. Genet. 43, 838–846 (2011). | ||
87 | 3 | 79 | Cornman, A., West-Roberts, J., Camargo, A. P., Roux, S., Beracochea, M., Mirdita, M., Ovchinnikov, S., and Hwang, Y. (2024). The OMG dataset: An Open MetaGenomic corpus for mixed-modality genomic language modeling. bioRxiv preprint ( 2024–08). https://www.biorxiv.org/content/10.1101/2024.08.14.607850v1. | ||
959 | 22 | 42 | Cui H, Wang C, Maan H. et al. scGPT: towards building a foundation model for single-cell multi-omics using generative AI. Nature Methods 2024;1–11. | ||
1049 | 22 | 132 | Cui P, Athey S. Stable learning establishes some common ground between causal inference and machine learning. Nat Mach Intell 2022;4:110–5. 10.1038/s42256-022-00445-z. | ||
707 | 14 | 6 | Cui, H. et al. scGPT: toward building a foundation model for single-cell multi-omics using generative AI. Nat Methods 1–11 (2024) doi:10.1038/s41592-024-02201-0. | ||
1119 | 24 | 2 | D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473, 2014. | ||
195 | 5 | 9 | D. Bloomfield, J. Pannu, A. W. Zhu, M. Y. Ng, A. Lewis, E. Bendavid, S. M. Asch, T. Hernandez-Boussard, A. Cicero, and T.Inglesby. AIandbiosecurity: Theneedforgovernance. Science, 385(6711):831–833, 2024. doi:10.1126/science.adq1977. URL https://www.science.org/doi/abs/10.1126/science.adq1977. | ||
1137 | 24 | 20 | D. Gankin, A. Karollus, M. Grosshauser, K. Klemon, J. Hingerl, and J. Gagneur. Species-aware DNA language modeling. bioRxiv, pages 2023–01, 2023. | ||
1362 | 28 | 14 | D. Guo, Q. Zhu, D. Yang, Z. Xie, K. Dong, W. Zhang, G. Chen, X. Bi, Y. Wu, Y. K. Li, F. Luo, Y. Xiong, and W. Liang. Deepseek-coder: When the large language model meets programming – the rise of code intelligence, 2024. | ||
554 | 11 | 12 | D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020. | ||
1363 | 28 | 15 | D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020. | ||
1364 | 28 | 16 | D. Hendrycks, C. Burns, S. Kadavath, A. Arora, S. Basart, E. Tang, D. Song, and J. Steinhardt. Measuring mathematical problem solving with the math dataset. arXiv preprint arXiv:2103.03874, 2021. | ||
228 | 5 | 41 | D. Hyatt, G.-L. Chen, P. F. Locascio, M. L. Land, F. W. Larimer, and L. J. Hauser. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119, Mar. 2010. | ||
1153 | 24 | 36 | D. K. Pokholok, C. T. Harbison, S. Levine, F. Lewitter, D. K. Gifford, and R. A. Young. Genome-wide map of nucleosome acetylation and methylation in yeast. Cell, 122(4):517–527, 2005. | ||
1125 | 24 | 8 | D. M. Church, V. A. Schneider, T. Graves, K. Auger, F. Cunningham, N. Bouk, H.-C. Chen, R. Agarwala, W. M. McLaren, G. R. Ritchie, et al. Modernizing reference genome assemblies. PLoS biology, 9(7): e1001091, 2011. | ||
272 | 5 | 87 | D. Piya, N. Nolan, M. L. Moore, L. A. R. Hernandez, B. F. Cress, R. Young, A. P. Arkin, and V. K. Mutalik. Systematic and scalable genome-wide essentiality mapping to identify nonessential genes in phages. PLOS Biology, 21:e3002416, 12 2023. ISSN 1545-7885. doi: 10.1371/journal.pbio.3002416. | ||
570 | 11 | 28 | D. Rein, B. L. Hou, A. C. Stickland, J. Petty, R. Y. Pang, J. Dirani, J. Michael, and S. R. Bowman. GPQA: A graduate-level google-proof q&a benchmark. arXiv preprint arXiv:2311.12022, 2023. | ||
573 | 11 | 31 | D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. P. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel, and D. Hassabis. Mastering the game of go without human knowledge. Nat., 550(7676):354–359, 2017b. doi: 10.1038/NATURE24270. URL https://doi.org/10.1038/nature24270. | ||
572 | 11 | 30 | D. Silver, T. Hubert, J. Schrittwieser, I. Antonoglou, M. Lai, A. Guez, M. Lanctot, L. Sifre, D. Kumaran, T. Graepel, T. P. Lillicrap, K. Simonyan, and D. Hassabis. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. CoRR, abs/1712.01815, 2017a. URL http://arxiv.org/abs/1712.01815. | ||
1221 | 25 | 45 | D. Van Veen, C. Van Uden, L. Blankemeier, J.-B. Delbrouck, A. Aali, C. Bluethgen, A. Pareek, M. Polacin, E. P. Reis, A. Seehofnerová, et al., “Adapted large language models can outperform medical experts in clinical text summarization,” Nature medicine, vol. 30, no. 4, pp. 1134–1142, 2024. | ||
1162 | 24 | 45 | D. W. Romero, A. Kuzina, E. J. Bekkers, J. M. Tomczak, and M. Hoogendoorn. Ckconv: Continuous kernel convolution for sequential data. arXiv preprint arXiv:2102.02611, 2021b. | ||
1161 | 24 | 44 | D. W. Romero, R.-J. Bruintjes, J. M. Tomczak, E. J. Bekkers, M. Hoogendoorn, and J. C. van Gemert. Flexconv: Continuous kernel convolution swith differentiable kernel sizes. arXiv preprint arXiv:2110.08059, 2021a. | ||
1136 | 24 | 19 | D. Y. Fu, E. L. Epstein, E. Nguyen, A. W. Thomas, M. Zhang, T. Dao, A. Rudra, and C. Ré. Simple hardware-efficient long convolutions for sequence modeling. arXiv preprint arXiv:2302.06646, 2023. | ||
219 | 5 | 32 | D.G.Gibson, G.A.Benders, C.Andrews-Pfannkoch, E.A.Denisova, H.Baden-Tillson, J.Zaveri, T.B.Stockwell, A. Brownley, D. W. Thomas, M. A. Algire, C. Merryman, L. Young, V. N. Noskov, J. I. Glass, J. C. Venter, C. A. Hutchison, and H. O. Smith. Complete chemical synthesis, assembly, and cloning of a Mycoplasma genitalium genome. Science, 319(5867):1215–1220, 2008. | ||
267 | 5 | 82 | D.H.Parks, M.Chuvochina, C.Rinke, A.J.Mussig, P.-A.Chaumeil, andP.Hugenholtz. GTDB:anongoingcen-sus of bacterial and archaeal diversity through a phylogenetically consistent, rank normalized and complete genome-based taxonomy. Nucleic Acids Res., 50(D1):D785–D794, Jan. 2022. | ||
101 | 3 | 93 | Dai, Z., Yang, Z., Yang, Y., Carbonell, J. G., Le, Q. V., and Salakhutdinov, R. Transformer-XL: Attentive Language Models beyond a Fixed-Length Context. In: Korhonen, A., Traum, D. R., and M`arquez, L., eds. Proceedings of the 57th Conference of the Association for Computational Linguistics, ACL 2019, Florence, Italy, July 28- August 2, 2019, Volume 1: Long Papers. Association for Computational Linguistics (2019):( 2978–2988). https://doi.org/10.18653/v1/p19-1285. doi:10.18653/V1/P19-1285. | ||
974 | 22 | 57 | Dalla-Torre H, Gonzalez L, Mendoza J. et al. The nucleotide transformer: building and evaluating robust foundation models for human genomics. bioRxiv 2023;2023–01. | ||
405 | 8 | 6 | Dalla-Torre, H. et al. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nat. Methods 22, 287–297 (2025). | ||
149 | 4 | 9 | Dalla-Torre, H. et al. Nucleotide Transformer: building and evaluating robust foundation models for human genomics. Nat. Methods https://doi.org/10.1038/s41592-024-02523-z (2024). | ||
712 | 14 | 11 | Dalla-Torre, H. et al. The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics. Preprint at https://doi.org/10.1101/2023.01.11.523679 (2023). | ||
24 | 3 | 16 | Dalla-Torre, H., Gonzalez, L., Mendoza Revilla, J., Lopez Carranza, N., Henryk Grywaczewski, A., Oteri, F., Dallago, C., Trop, E., Sirelkhatim, H., Richard, G. et al. (2023). The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2023.01.11.523679v3. | ||
1257 | 26 | 20 | Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y. K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. CoRR, abs/2401.06066, 2024. | ||
1564 | 33 | 20 | Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y. K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. CoRR, abs/2401.06066, 2024. | ||
1645 | 34 | 20 | Damai Dai, Chengqi Deng, Chenggang Zhao, R. X. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y. K. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. CoRR, abs/2401.06066, 2024. | ||
2040 | 35 | 313 | Damian Szklarczyk, Rebecca Kirsch, Mikaela Koutrouli, Katerina Nastou, Farrokh Mehryary, Radja Hachilif, Annika L Gable, Tao Fang, Nadezhda T Doncheva, Sampo Pyysalo, et al. 2023. The STRING database in 2023: protein–protein association networks and functional enrichment analyses for any sequenced genome of interest. Nucleic acids research 51, D(2023), D638–D646. | ||
2017 | 35 | 290 | Damiano Sgarbossa, Umberto Lupo, and Anne-Florence Bitbol. 2023. Generative power of a protein language model trained on multiple sequence alignments. Elife 12 (2023), e79854. | ||
1273 | 26 | 36 | Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. In NeurIPS Datasets and Benchmarks, 2021b. | ||
1572 | 33 | 28 | Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. In NeurIPS Datasets and Benchmarks, 2021b. | ||
1660 | 34 | 35 | Dan Hendrycks, Collin Burns, Saurav Kadavath, Akul Arora, Steven Basart, Eric Tang, Dawn Song, and Jacob Steinhardt. Measuring mathematical problem solving with the MATH dataset. In NeurIPS Datasets and Benchmarks, 2021b. | ||
1855 | 35 | 128 | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. 2021. Measuring Massive Multitask Language Understanding. arXiv:2009.03300 [cs.CY] | ||
1272 | 26 | 35 | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. In ICLR. OpenReview.net, 2021a. | ||
1571 | 33 | 27 | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. In ICLR. OpenReview.net, 2021a. | ||
1659 | 34 | 34 | Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. In ICLR. OpenReview.net, 2021a. | ||
2122 | 35 | 395 | Dan Zhang, Ziniu Hu, Sining Zhoubian, Zhengxiao Du, Kaiyu Yang, Zihan Wang, Yisong Yue, Yuxiao Dong, and Jie Tang. 2024. Sciglm: Training scientific language models with self-reflective instruction annotation and tuning. arXiv preprint arXiv:2401.07950 (2024). | ||
1051 | 22 | 134 | Danaee P, Rouches M, Wiley M. et al. bpRNA: large-scale automated annotation and analysis of RNA secondary structure. Nucleic Acids Res 2018;46:5381–94. 10.1093/nar/gky285. | ||
745 | 15 | 9 | Daniel Cer, Mona Diab, Eneko Agirre, Inigo Lopez-Gazpio, and Lucia Specia. Semeval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), 2017. doi: 10.18653/v1/s17-2001. URL http://dx.doi.org/10.18653/v1/S17-2001. | ||
1856 | 35 | 129 | Daniel Hesslow, Niccoló Zanichelli, Pascal Notin, Iacopo Poli, and Debora Marks. 2022. RITA: a Study on Scaling Up Generative Protein Sequence Models. arXiv (May 2022), arXiv:2205.05789. | ||
1906 | 35 | 179 | Daniel Levine, Sacha Lévy, Syed Asad Rizvi, Nazreen Pallikkavaliyaveetil, Xingyu Chen, David Zhang, Sina Ghadermarzi, Ruiming Wu, Zihe Zheng, Ivan Vrkic, et al. 2023. Cell2sentence: Teaching large language models the language of biology. bioRxiv (2023), 2023–09. | ||
1939 | 35 | 212 | Daniel Mark Lowe. 2012. Extraction of chemical structures and reactions from the literature. Ph. D. Dissertation. University of Cambridge. | ||
779 | 15 | 43 | Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, and Sanjeev Khudanpur. Semi-orthogonal low-rank matrix factorization for deep neural networks. In Interspeech, pp. 3743–3747, 2018. | ||
1992 | 35 | 265 | Daniil Polykovskiy, Alexander Zhebrak, Benjamin Sanchez-Lengeling, Sergey Golovanov, Oktai Tatanov, Stanislav Belyaev, Rauf Kurbanov, Aleksey Artamonov, Vladimir Aladinskiy, Mark Veselov, et al. 2020. Molecular sets (MOSES): a benchmarking platform for molecular generation models. Frontiers in pharmacology 11 (2020), 565644. | ||
1046 | 22 | 129 | Dao T, Fu D, Ermon S. et al. Flashattention: fast and memory-efficient exact attention with io-awareness. Adv Neural Inf Process Syst 2022;35:16344–59. | ||
2124 | 35 | 397 | Daoan Zhang, Weitong Zhang, Bing He, Jianguo Zhang, Chenchen Qin, and Jianhua Yao. 2023. DNAGPT: A Generalized Pretrained Tool for Multiple DNA Sequence Analysis Tasks. bioRxiv (2023), 2023–07. | ||
1857 | 35 | 130 | David Hiscock and Chris Upton. 2000. Viral Genome DataBase: storing and analyzing genes and proteins from complete viral genomes. Bioinformatics 16, 5 (2000), 484–485. | ||
605 | 12 | 22 | David K. Yang Samuel L. Goldman Eli Weinstein Debora Marks, Generative models for codon prediction and optimization. https://mlcb.github.io/mlcb2019_proceedings/papers/paper_29.pdf. | ||
2079 | 35 | 352 | David L Wheeler, Tanya Barrett, Dennis A Benson, Stephen H Bryant, Kathi Canese, Vyacheslav Chetvernin, Deanna M Church, Michael DiCuccio, Ron Edgar, Scott Federhen, et al. 2007. Database resources of the national center for biotechnology information. Nucleic acids research 36, suppl_1 (2007), D13–D21. | ||
1895 | 35 | 168 | David R Krathwohl. 2002. A revision of Bloom’s taxonomy: An overview. Theory into practice 41, 4 (2002), 212–218. | ||
1082 | 23 | 28 | David R. Kelley, Jasper Snoek, and John L. Rinn. Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Research, 26(7):990–999, July 2016. ISSN 1549-5469. doi: 10.1101/gr.200535.115. | ||
1083 | 23 | 29 | David R. Kelley, Yakir A. Reshef, Maxwell Bileschi, David Belanger, Cory Y. McLean, and Jasper Snoek. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Research, 28(5):739–750, May 2018. ISSN 1088-9051, 1549-5469. doi: 10.1101/gr.227819.117. URL http://genome.cshlp.org/lookup/doi/10.1101/gr.227819.117. | ||
1311 | 26 | 74 | David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. GPQA: A graduate-level Google-proof Q&A benchmark. CoRR, abs/2311.12022, 2023. | ||
1606 | 33 | 62 | David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. GPQA: A graduate-level Google-proof Q&A benchmark. CoRR, abs/2311.12022, 2023. | ||
1697 | 34 | 72 | David Rein, Betty Li Hou, Asa Cooper Stickland, Jackson Petty, Richard Yuanzhe Pang, Julien Dirani, Julian Michael, and Samuel R. Bowman. GPQA: A graduate-level Google-proof Q&A benchmark. CoRR, abs/2311.12022, 2023. | ||
2082 | 35 | 355 | David S Wishart, Yannick D Feunang, An C Guo, Elvis J Lo, Ana Marcu, Jason R Grant, Tanvir Sajed, Daniel Johnson, Carin Li, Zinat Sayeeda, et al. 2018. DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic acids research 46, D1 (2018), D1074–D1082. | ||
2077 | 35 | 350 | David Weininger. 1988. SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules. Journal of chemical information and computer sciences 28, 1 (1988), 31–36. | ||
382 | 7 | 70 | Davis, C. A. et al. The Encyclopedia of DNA elements (ENCODE): data portal update. Nucleic Acids Res. 46, D794–D801 (2018). | ||
354 | 7 | 42 | Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010). | ||
55 | 3 | 47 | de Almeida, B. P., Dalla-Torre, H., Richard, G., Blum, C., Hexemer, L., G´elard, M., Mendoza-Revilla, J., Pandey, P., Laurent, S., Lopez, M. et al. (2024). SegmentNT: annotating the genome at single-nucleotide resolution with DNA foundation models. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2024.03.14.584712v2. | ||
434 | 9 | 12 | de Almeida, B. P., Reiter, F., Pagani, M. & Stark, A. DeepSTARR predicts enhancer activity from dna sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022). | ||
494 | 10 | 12 | de Almeida, B. P., Reiter, F., Pagani, M. & Stark, A. DeepSTARR predicts enhancer activity from dna sequence and enables the de novo design of synthetic enhancers. Nat. Genet. 54, 613–624 (2022). | ||
42 | 3 | 34 | de Almeida, B. P., Reiter, F., Pagani, M., and Stark, A. (2022). DeepSTARR predicts enhancer activity from DNA sequence and enables the de novo design of synthetic enhancers. Nature Genetics 54, 613–624. | ||
773 | 15 | 37 | Decoupled weight decay regularization. arXiv preprint Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. | ||
542 | 11 | 0 | DeepSeek-AI, D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, X. Zhang, X. Yu, Y. Wu, Z. F. Wu, Z. Gou, Z. Shao, Z. Li, Z. Gao, A. Liu, B. Xue, B. Wang, B. Wu, B. Feng, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, D. Dai, D. Chen, D. Ji, E. Li, F. Lin, F. Dai, F. Luo, G. Hao, G. Chen, G. Li, H. Zhang, H. Bao, H. Xu, H. Wang, H. Ding, H. Xin, H. Gao, H. Qu, H. Li, J. Guo, J. Li, J. Wang, J. Chen, J. Yuan, J. Qiu, J. Li, J. L. Cai, J. Ni, J. Liang, J. Chen, K. Dong, K. Hu, K. Gao, K. Guan, K. Huang, K. Yu, L. Wang, L. Zhang, L. Zhao, L. Wang, L. Zhang, L. Xu, L. Xia, M. Zhang, M. Zhang, M. Tang, M. Li, M. Wang, M. Li, N. Tian, P. Huang, P. Zhang, Q. Wang, Q. Chen, Q. Du, R. Ge, R. Zhang, R. Pan, R. Wang, R. J. Chen, R. L. Jin, R. Chen, S. Lu, S. Zhou, S. Chen, S. Ye, S. Wang, S. Yu, S. Zhou, S. Pan, S. S. Li, S. Zhou, S. Wu, S. Ye, T. Yun, T. Pei, T. Sun, T. Wang, W. Zeng, W. Zhao, W. Liu, W. Liang, W. Gao, W. Yu, W. Zhang, W. L. Xiao, W. An, X. Liu, X. Wang, X. Chen, X. Nie, X. Cheng, X. Liu, X. Xie, X. Liu, X. Yang, X. Li, X. Su, X. Lin, X. Q. Li, X. Jin, X. Shen, X. Chen, X. Sun, X. Wang, X. Song, X. Zhou, X. Wang, X. Shan, Y. K. Li, Y. Q. Wang, Y. X. Wei, Y. Zhang, Y. Xu, Y. Li, Y. Zhao, Y. Sun, Y. Wang, Y. Yu, Y. Zhang, Y. Shi, Y. Xiong, Y. He, Y. Piao, Y. Wang, Y. Tan, Y. Ma, Y. Liu, Y. Guo, Y. Ou, Y. Wang, Y. Gong, Y. Zou, Y. He, Y. Xiong, Y. Luo, Y. You, Y. Liu, Y. Zhou, Y. X. Zhu, Y. Xu, Y. Huang, Y. Li, Y. Zheng, Y. Zhu, Y. Ma, Y. Tang, Y. Zha, Y. Yan, Z. Z. Ren, Z. Ren, Z. Sha, Z. Fu, Z. Xu, Z. Xie, Z. Zhang, Z. Hao, Z. Ma, Z. Yan, Z. Wu, Z. Gu, Z. Zhu, Z. Liu, Z. Li, Z. Xie, Z. Song, Z. Pan, Z. Huang, Z. Xu, Z. Zhang, and Z. Zhang. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. 1 2025. URL https://arxiv.org/pdf/2501.12948. | ||
1358 | 28 | 10 | DeepSeek-AI. Deepseek LLM: scaling open-source language models with longtermism. CoRR,abs/2401.02954, 2024. doi: 10.48550/ARXIV.2401.02954. URL https://doi.org/10.48550/arXiv.2401.02954. | ||
1198 | 25 | 22 | Deloitte, “Deloitte acquires gryphon scientific business to expand security science and public health capabilities,” 2024. https://www2.deloitte.com/us/en/pages/about-deloitte/articles/press-releases/deloitte-acquires-gryphon-scientific-business-to-expand-security-science-and-public-health-capabilities.html. | ||
735 | 14 | 34 | DeLong, Elizabeth R., et al. “Comparing the Areas under Two or More Correlated Receiver Operating Characteristic Curves: A Nonparametric Approach.” Biometrics, vol. 44, no. 3, 1988, pp. 837–45. JSTOR, https://doi.org/10.2307/2531595. | ||
1429 | 30 | 12 | den Dunnen J.T., Dalgleish R., Maglott D.R., Hart R.K., Greenblatt M.S., McGowan-Jordan J., Roux A.F., Smith T., Antonarakis S.E., Taschner P.E. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 2016; 37:564–569. | ||
876 | 20 | 4 | den Dunnen JT, Antonarakis SE. Mutation nomenclature extensions and suggestions to describe complex mutations: a discussion. Hum. Mutation. 2000;15:7–12. | ||
863 | 19 | 2 | den Dunnen JT, Dalgleish R, Maglott DR, Hart RK, Greenblatt MS, McGowan-Jordan J, Roux AF, Smith T, Antonarakis SE, Taschner PE. HGVS Recommendations for the Description of Sequence Variants: 2016 Update. Hum Mutat. 2016. https://doi.org/10.1002/humu.22981. (PMID 26931183.) | ||
588 | 12 | 5 | Deng, Y., de Lima Hedayioglu, F., Kalfon, J., Chu, D. & von der Haar, T. Hidden patterns of codon usage bias across kingdoms. J. R. Soc. Interface 17, 20190819 (2020). | ||
2162 | 36 | 2 | Dennis A Benson, Mark Cavanaugh, Karen Clark, Ilene Karsch-Mizrachi, David J Lipman, James Ostell, and Eric W Sayers. Genbank. Nucleic acids research, 41(D1):D36–D42, 2012. | ||
1074 | 23 | 20 | Dennis Gankin, Alexander Karollus, Martin Grosshauser, Kristian Klemon, Johannes Hingerl, and Julien Gagneur. Species-aware DNA language modeling. bioRxiv, pp. 2023.01.26.525670, Jan-uary 2023. doi: 10.1101/2023.01.26.525670. URL http://biorxiv.org/content/early/2023/01/27/2023.01.26.525670.abstract. Published as a conference paper at ICLR 2024 | ||
987 | 22 | 70 | Denny V, Krötzsch M. Wikidata: a free collaborative knowledgebase. Communications of the ACM, 2014;57:78–85. | ||
418 | 8 | 19 | Derraz, B. et al. New regulatory thinking is needed for AI-based personalised drug and cell therapies in precision oncology. Npj Precis. Oncol. 8, 1–11 (2024). | ||
882 | 21 | 4 | Devlin J. et al. (2019) Bert: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA. pp. 4171–4186. Association for Computational Linguistics. https://www.aclweb.org/anthology/N19-1423. | ||
940 | 22 | 23 | Devlin J, Chang MW, Lee K. et al. Bert: pre-training of deep bidirectional transformers for languageunderstanding. arXiv preprint arXiv:1810.04805, 2018. | ||
613 | 12 | 30 | Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional Transformers for language understanding. https://doi.org/10.48550/ARXIV.1810.04805 (2018) | ||
423 | 9 | 1 | Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018). | ||
483 | 10 | 1 | Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: pre-training of deep bidirectional transformers for language understanding. Preprint at https://arxiv.org/abs/1810.04805 (2018). | ||
718 | 14 | 17 | Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. Preprint at https://doi.org/10.48550/arXiv.1810.04805 (2019). | ||
52 | 3 | 44 | Devlin, J., Chang, M.-W., Lee, K., and Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In: Burstein, J., Doran, C., and Solorio, T., eds. Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota: Association for Computational Linguistics (2019):(4171–4186). https://aclanthology.org/N19-1423. doi:10.18653/v1/N19-1423. | ||
328 | 7 | 16 | di Iulio, J. et al. The human noncoding genome defined by genetic diversity. Nat. Genet. 50, 333–337 (2018). | ||
1875 | 35 | 148 | Di Jin, Eileen Pan, Nassim Oufattole, Wei-Hung Weng, Hanyi Fang, and Peter Szolovits. 2020. What Disease does this Patient Have? A Large-scale Open Domain Question Answering Dataset from Medical Exams. arXiv:2009.13081 | ||
2123 | 35 | 396 | Di Zhang, Wei Liu, Qian Tan, Jingdan Chen, Hang Yan, Yuliang Yan, Jiatong Li, Weiran Huang, Xiangyu Yue, Dongzhan Zhou, et al. 2024. Chemllm: A chemical large language model. arXiv preprint arXiv:2402.06852 (2024). | ||
761 | 15 | 25 | Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization, 2017. | ||
1787 | 35 | 60 | Dimitrios Christofidellis, Giorgio Giannone, Jannis Born, Ole Winther, Teodoro Laino, and Matteo Manica. 2023. Unifying Molecular and Textual Representations via Multi-task Language Modelling. arXiv:2301.12586 [cs.LG] | ||
1805 | 35 | 78 | Dina Demner-Fushman, Marc D Kohli, Marc B Rosenman, Sonya E Shooshan, Laritza Rodriguez, Sameer Antani, George R Thoma, and Clement J McDonald. 2016. Preparing a collection of radiology examinations for distribution and retrieval. Journal of the American Medical Informatics Association 23, 2 (2016), 304–310. | ||
762 | 15 | 26 | Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. Gshard: Scaling giant models with conditional computation and automatic sharding, 2020. | ||
1287 | 26 | 50 | Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. GShard: Scaling giant models with conditional computation and automatic sharding. CoRR, abs/2006.16668, 2020. | ||
1674 | 34 | 49 | Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen. GShard: Scaling giant models with conditional computation and automatic sharding. CoRR, abs/2006.16668, 2020. | ||
883 | 21 | 5 | Doğan R.I. et al. (2014) NCBI disease corpus: a resource for disease name recognition and concept normalization. J. Biomed. Inform., 47, 1–10. | ||
2180 | 36 | 20 | Dohoon Lee, Jeewon Yang, and Sun Kim. Learning the histone codes with large genomic windows and three-dimensional chromatin interactions using transformer. Nature Communications, 13(1): 6678, 2022. | ||
1744 | 35 | 17 | Dominique Beaini, Shenyang Huang, Joao Alex Cunha, Gabriela Moisescu-Pareja, Oleksandr Dymov, Samuel Maddrell-Mander, Callum McLean, Frederik Wenkel, Luis Müller, Jama Hussein Mohamud, et al. 2023. Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets. arXiv preprint arXiv:2310.04292 (2023). | ||
1771 | 35 | 44 | Dong Chen, Kaifu Gao, Duc Duy Nguyen, Xin Chen, Yi Jiang, Guo-Wei Wei, and Feng Pan. 2021. Algebraic graph-assisted bidirectional transformers for molecular property prediction. Nature communications 12, 1 (2021), 3521. | ||
926 | 22 | 9 | Dong K, Zhang S. Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder. Nat Commun 2022;13:1739. 10.1038/s41467-022-29439-6. | ||
2103 | 35 | 376 | Dongyu Xue, Han Zhang, Dongling Xiao, Yukang Gong, Guohui Chuai, Yu Sun, Hao Tian, Hua Wu, Yukun Li, and Qi Liu. 2020. X-MOL: large-scale pre-training for molecular understanding and diverse molecular analysis. bioRxiv (2020), 2020–12. | ||
1885 | 35 | 158 | Donna Karolchik, Robert Baertsch, Mark Diekhans, Terrence S Furey, Angie Hinrichs, YT Lu, Krishna M Roskin, Matt Schwartz, Charles W Sugnet, Daryl J Thomas, et al. 2003. The UCSC genome browser database. Nucleic acids research 31, 1 (2003), 51–54. | ||
349 | 7 | 37 | Drinane, M. C., Sherman, J. A., Hall, A. E., Simons, M. & Mulligan-Kehoe, M. J. Plasminogen and plasmin activity in patients with coronary artery disease. J. Thromb. Haemost. 4, 1288–1295 (2006). | ||
641 | 12 | 58 | Dynamic Time Warping. in Information Retrieval for Music and Motion, 69–84 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2007). | ||
189 | 5 | 3 | E. Almazrouei, H. Alobeidli, A. Alshamsi, A. Cappelli, R. Cojocaru, M. Debbah, É. Goffinet, D. Hesslow, J. Launay, Q. Malartic, et al. The falcon series of open language models. arXiv preprint arXiv:2311.16867, 2023. A. Andonian, Q. Anthony, S. Biderman, S. Black, P. Gali, L. Gao, E. Hallahan, J. Levy-Kramer, C. Leahy, | ||
252 | 5 | 65 | E. C. Meng, T. D. Goddard, E. F. Pettersen, G. S. Couch, Z. J. Pearson, J. H. Morris, and T. E. Ferrin. Ucsf chimerax: Tools for structure building and analysis. Protein Science, 32(11):e4792, 2023. doi: https://doi.org/10.1002/pro.4792. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/pro.4792. | ||
799 | 16 | 0 | E. Huckvale and H. N. Moseley. kegg pull: a software package for the restful access and pulling from the kyoto encyclopedia of gene and genomes. BMC Bioinformatics, 24:1–17, 12 2023. ISSN 14712105. doi: 10.1186/S12859-023-05208-0/TABLES/12. URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-023-05208-0http://creativecommons.org/publicdomain/zero/1.0/. | ||
736 | 15 | 0 | E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. Lora: Low-rank adaptation of large language models, 2021. URL https://arxiv.org/abs/2106.09685. | ||
1117 | 24 | 0 | E. Nguyen, M. Poli, M. Faizi, A. W. Thomas, C. B. Sykes, M. Wornow, A. Patel, C. Rabideau, S. Massaroli, Y. Bengio, S. Ermon, S. A. Baccus, and C. Ré. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. ArXiv, 6 2023. ISSN 2331-8422. URL https://arxiv.org/pdf/2306.15794. | ||
261 | 5 | 74 | E. Nijkamp, J. A. Ruffolo, E. N. Weinstein, N. Naik, and A. Madani. ProGen2: exploring the boundaries of protein language models. Cell Systems, 14(11):968–978, 2023. | ||
276 | 5 | 91 | E. Proux-Wéra, D. Armisén, K. P. Byrne, and K. H. Wolfe. A pipeline for automated annotation of yeast genome sequences by a conserved-synteny approach. BMC Bioinformatics, 13:1–12, 2012. | ||
1228 | 25 | 52 | E. Schmidt, “Ai will transform science.” https://www.technologyreview.com/2023/07/05/1075865/eric-schmidt-ai-will-transform-science/, 2023. Accessed: 2024-08-07. | ||
295 | 5 | 110 | E. Szathmáry and J. M. Smith. The major evolutionary transitions. Nature, 374(6519):227–232, 1995. | ||
1493 | 32 | 0 | E. Wang, S. Schmidgall, P. F. Jaeger, F. Zhang, R. Pilgrim, Y. Matias, J. Barral, D. Fleet, and S. Azizi. Txgemma: Efficient and agentic llms for therapeutics. https://arxiv.org/pdf/2504.06196 | ||
259 | 5 | 72 | E.Nguyen, M.Poli, M.G.Durrant, B.Kang, D.Katrekar, D.B.Li, L.J.Bartie, A.W.Thomas, S.H.King, G.Brixi, J. Sullivan, M. Y. Ng, A. Lewis, A. Lou, S. Ermon, S. A. Baccus, T. Hernandez-Boussard, C. Ré, P. D. Hsu, and B. L. Hie. Sequence modeling and design from molecular to genome scale with evo. Science, 386(6723): eado9336, Nov. 2024a. | ||
260 | 5 | 73 | E.Nguyen,M.Poli,M.Faizi,A.Thomas,M.Wornow,C.Birch-Sykes,S.Massaroli,A.Patel,C.Rabideau,Y.Bengio, et al. HyenaDNA: Long-range genomic sequence modeling at single nucleotide resolution. Advances in neural information processing systems, 36, 2024b. | ||
1827 | 35 | 100 | EA Feingold, PJ Good, MS Guyer, S Kamholz, L Liefer, K Wetterstrand, FS Collins, TR Gingeras, D Kampa, EA Sekinger, et al. 2004. The ENCODE (ENCyclopedia of DNA elements) project. Science 306, 5696 (2004), 636–640. | ||
1241 | 26 | 4 | Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru, M´ erouane Debbah, Etienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, and Guilherme Penedo. The Falcon series of open language models. CoRR, abs/2311.16867, 2023. | ||
1629 | 34 | 4 | Ebtesam Almazrouei, Hamza Alobeidli, Abdulaziz Alshamsi, Alessandro Cappelli, Ruxandra Cojocaru,´M´ erouane Debbah, Etienne Goffinet, Daniel Hesslow, Julien Launay, Quentin Malartic, Daniele Mazzotta, Badreddine Noune, Baptiste Pannier, and Guilherme Penedo. The Falcon series of open language models. CoRR, abs/2311.16867, 2023. | ||
1302 | 26 | 65 | Edoardo Maria Ponti, Goran Glavas, Olga Majewska, Qianchu Liu, Ivan Vulic, and Anna Korhonen. XCOPA: A multilingual dataset for causal commonsense reasoning. In EMNLP (1), pp. 2362–2376. | ||
1599 | 33 | 55 | Edoardo Maria Ponti, Goran Glavas, Olga Majewska, Qianchu Liu, Ivan Vulic, and Anna Korhonen. XCOPA: A multilingual dataset for causal commonsense reasoning. In EMNLP (1), pp. 2362–2376. Association for Computational Linguistics, 2020. | ||
1688 | 34 | 63 | Edoardo Maria Ponti, Goran Glavas, Olga Majewska, Qianchu Liu, Ivan Vulic, and Anna Korhonen. XCOPA: A multilingual dataset for causal commonsense reasoning. In EMNLP (1), pp. 2362–2376. Association for Computational Linguistics, 2020. | ||
2173 | 36 | 13 | Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. | ||
1428 | 30 | 11 | Eilbeck K., Lewis S.E., Mungall C.J., Yandell M., Stein L., Durbin R., Ashburner M. The Sequence ontology: a tool for the unification of genome annotations. Genome Biol. 2005; 6:R44. | ||
1445 | 30 | 28 | Eisenhauer E.A., Therasse P., Bogaerts J., Schwartz L.H., Sargent D., Ford R., Dancey J., Arbuck S., Gwyther S., Mooney M.et al. . New response evaluation criteria in solid tumours: revised RECIST guideline (version 1.1). Eur. J. Cancer. 2009; 45:228–247. | ||
795 | 15 | 59 | Elad Ben Zaken, Shauli Ravfogel, and Yoav Goldberg. Bitfit: Simple parameter-efficient fine-tuning for transformer-based masked language-models, 2021. | ||
1798 | 35 | 71 | elements in the human genome. Nature 489, 7414 (2012), 57–74. | ||
1013 | 22 | 96 | Elnaggar A, Heinzinger M, Dallago C. et al. Prottrans: toward understanding the language of life through self-supervised learning. IEEE Trans Pattern Anal Mach Intell 2021;44:7112–27. 10.1109/TPAMI.2021.3095381. | ||
427 | 9 | 5 | Elnaggar, A. et al. ProtTrans: towards cracking the language of life’s code through self-supervised deep learning and high performance computing. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022). | ||
487 | 10 | 5 | Elnaggar, A. et al. ProtTrans: towards cracking the language of life’s code through self-supervised deep learning and high performance computing. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127 (2022). | ||
821 | 17 | 7 | Ely JW, Osheroff JA, Chambliss ML et al. Answering physicians’ clinical questions: obstacles and potential solutions. J Am Med Inform Assoc 2005;12:217–24. | ||
1790 | 35 | 63 | Emily Clough and Tanya Barrett. 2016. The gene expression omnibus database. Statistical Genomics: Methods and Protocols (2016), 93–110. | ||
1755 | 35 | 28 | Emmanuel Boutet, Damien Lieberherr, Michael Tognolli, Michel Schneider, and Amos Bairoch. 2007. UniProtKB/Swiss-Prot: the manually annotated section of the UniProt KnowledgeBase. In Plant bioinformatics: methods and protocols. Springer, 89–112. Scientific Large Language Models: A Survey on Biological & Chemical Domains 75 | ||
1756 | 35 | 29 | Emmanuel Boutet, Damien Lieberherr, Michael Tognolli, Michel Schneider, Parit Bansal, Alan J Bridge, Sylvain Poux, Lydie Bougueleret, and Ioannis Xenarios. 2016. UniProtKB/Swiss-Prot, the manually annotated section of the UniProt KnowledgeBase: how to use the entry view. Plant bioinformatics: methods and protocols (2016), 23–54. | ||
2167 | 36 | 7 | ENCODE Project Consortium et al. An integrated encyclopedia of dna elements in the human genome. Nature, 489(7414):57, 2012. | ||
1796 | 35 | 69 | ENCODE Project Consortium Overall coordination (data analysis coordination) Dunham Ian 2 Kundaje Anshul 3 81 82 82, Writing group Bernstein | ||
1069 | 23 | 15 | ENCODE Project Consortium. An integrated encyclopedia of DNA elements in the human genome. Nature, 489(7414):57–74, September 2012. ISSN 1476-4687. doi: 10.1038/nature11247. | ||
1132 | 24 | 15 | ENCODE Project Consortium. An integrated encyclopedia of dna elements in the human genome. Nature, 489(7414):57, 2012. | ||
453 | 9 | 31 | ENCODE. An integrated encyclopedia of dna elements in the human genome. Nature 489, 57–74 (2012). | ||
513 | 10 | 31 | ENCODE. An integrated encyclopedia of dna elements in the human genome. Nature 489, 57–74 (2012). | ||
1133 | 24 | 16 | ENCODEProjectConsortium. Expande dencyclopaedias of DNA elements in the humanan dmouse genomes. Nature, 583:699–710, 2020. | ||
1834 | 35 | 107 | ennis Gankin, Alexander Karollus, Martin Grosshauser, Kristian Klemon, Johannes Hingerl, and Julien Gagneur. 2023. Species-aware DNA language modeling. bioRxiv (2023), 2023–01. | ||
1338 | 26 | 101 | Enyu Zhou, Guodong Zheng, Bing Wang, Zhiheng Xi, Shihan Dou, Rong Bao, Wei Shen, Limao Xiong, Jessica Fan, Yurong Mou, Rui Zheng, Tao Gui, Qi Zhang, and Xuanjing Huang. RMB: Comprehensively benchmarking reward models in LLM alignment. CoRR, abs/2410.09893, 2024. | ||
1724 | 34 | 99 | Enyu Zhou, Guodong Zheng, Bing Wang, Zhiheng Xi, Shihan Dou, Rong Bao, Wei Shen, Limao Xiong, Jessica Fan, Yurong Mou, Rui Zheng, Tao Gui, Qi Zhang, and Xuanjing Huang. RMB: Comprehensively benchmarking reward models in LLM alignment. CoRR, abs/2410.09893, 2024. | ||
1220 | 25 | 44 | Epic Systems Corporation, “Epic and microsoft bring gpt-4 to ehrs,” Epic, 2023. | ||
1222 | 25 | 46 | Epic, “Epic and microsoft bring gpt-4 to ehrs,” 2023. | ||
1038 | 22 | 121 | Eraslan G, Avsec Ž, Gagneur J. et al. Deep learning: new computational modeling techniques for genomics. Nat Rev Genet 2019;20:389–403. 10.1038/s41576-019-0122-6. | ||
435 | 9 | 13 | Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019). | ||
495 | 10 | 13 | Eraslan, G., Avsec, Ž., Gagneur, J. & Theis, F. J. Deep learning: new computational modelling techniques for genomics. Nat. Rev. Genet. 20, 389–403 (2019). | ||
660 | 13 | 15 | Erdal, B. S., Gupta, V., Demirer, M., Fair, K. H., White, R. D., Blair, J., Deichert, B., Lafleur, L., Qin, M. M., Bericat, D., and Genereaux, B. Integration and implementation strategies for ai algorithm deployment with smart routing rules and workflow management, 2023. | ||
1968 | 35 | 241 | Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, et al. 2023. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. arXiv preprint arXiv:2306.15794 (2023). | ||
1097 | 23 | 43 | Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Callum Birch-Sykes, Michael Wornow, Aman Patel, Clayton Rabideau, Stefano Massaroli, Yoshua Bengio, Stefano Ermon, Stephen A. Baccus, and Chris R´ e. HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. 2023. doi: 10.48550/ARXIV.2306.15794. URL https://arxiv.org/abs/2306.15794. Publisher: arXiv Version Number: 1. | ||
2183 | 36 | 23 | Eric Nguyen, Michael Poli, Marjan Faizi, Armin Thomas, Michael Wornow, Callum Birch-Sykes, Stefano Massaroli, Aman Patel, Clayton Rabideau, Yoshua Bengio, et al. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. Advances in neural information processing systems, 36, 2024. | ||
1967 | 35 | 240 | Eric Nguyen, Michael Poli, Matthew G Durrant, Armin W Thomas, Brian Kang, Jeremy Sullivan, Madelena Y Ng, Ashley Lewis, Aman Patel, Aaron Lou, et al. 2024. Sequence modeling and design from molecular to genome scale with Evo. bioRxiv (2024), 2024–02. | ||
1969 | 35 | 242 | Erik Nijkamp, Jeffrey A Ruffolo, Eli N Weinstein, Nikhil Naik, and Ali Madani. 2023. ProGen2: exploring the boundaries of protein language models. Cell Systems 14, 11 (2023), 968–978. | ||
661 | 13 | 16 | Eriksen, A. V., M¨ oller, S., and Ryg, J. Use of gpt-4 to diagnose complex clinical cases, 2024. | ||
2164 | 36 | 4 | Ermon, John Etchemendy, Kawin Ethayarajh, Li Fei-Fei, Chelsea Finn, Trevor Gale, Lauren Gillespie, Karan Goel, Noah Goodman, Shelby Grossman, Neel Guha, Tatsunori Hashimoto, Peter Henderson, John Hewitt, Daniel E. Ho, Jenny Hong, Kyle Hsu, Jing Huang, Thomas Icard, Saahil Jain, Dan Jurafsky, Pratyusha Kalluri, Siddharth Karamcheti, Geoff Keeling, Fereshte Khani, Omar Khattab, Pang Wei Koh, Mark Krass, Ranjay Krishna, Rohith Kuditipudi, Ananya Kumar, Faisal Ladhak, Mina Lee, Tony Lee, Jure Leskovec, Isabelle Levent, Xiang Lisa Li, Xuechen Li, Tengyu Ma, Ali Malik, Christopher D. Manning, Suvir Mirchandani, Eric Mitchell, Zanele Munyikwa, Suraj Nair, Avanika Narayan, Deepak Narayanan, Ben Newman, Allen Nie, Juan Carlos Niebles, Hamed Nilforoshan, Julian Nyarko, Giray Ogut, Laurel Orr, Isabel Papadimitriou, Joon Sung Park, Chris Piech, Eva Portelance, Christopher Potts, Aditi Raghunathan, Rob Reich, Hongyu Ren, Frieda Rong, Yusuf Roohani, Camilo Ruiz, Jack Ryan, Christopher Ré, Dorsa Sadigh, Shiori Sagawa, Keshav Santhanam, Andy Shih, Krishnan Srinivasan, Alex Tamkin, Rohan Taori, Armin W. Thomas, Florian Tramèr, Rose E. Wang, William Wang, Bohan Wu, Jiajun Wu, Yuhuai Wu, Sang Michael Xie, Michihiro Yasunaga, Jiaxuan You, Matei Zaharia, Michael Zhang, Tianyi Zhang, Xikun Zhang, Yuhui Zhang, Lucia Zheng, Kaitlyn Zhou, and Percy Liang. On the opportunities and risks of foundation models, 2022. | ||
980 | 22 | 63 | Ernst J, Kheradpour P, Mikkelsen TS. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 2011;473:43–9. 10.1038/nature09906. | ||
394 | 7 | 82 | Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011). | ||
1537 | 32 | 44 | Euclia. https://github.com/euclia/public-models. 2023. | ||
1888 | 35 | 161 | Eunji Kim, Dongseon Lee, Youngchun Kwon, Min Sik Park, and Youn-Suk Choi. 2021. Valid, plausible, and diverse retrosynthesis using tied two-way transformers with latent variables. Journal of Chemical Information and Modeling 61, 1 (2021), 123–133. | ||
1266 | 26 | 29 | Evan Frick, Peter Jin, Tianle Li, Karthik Ganesan, Jian Zhang, Jiantao Jiao, and Banghua Zhu. Athene-70b: Redefining the boundaries of post-training for open models, July 2024a. URL https://nexusflow.ai/blogs/athene. | ||
1653 | 34 | 28 | Evan Frick, Peter Jin, Tianle Li, Karthik Ganesan, Jian Zhang, Jiantao Jiao, and Banghua Zhu. Athene-70b: Redefining the boundaries of post-training for open models, July 2024a. URL https://nexusflow.ai/blogs/athene. | ||
1267 | 26 | 30 | Evan Frick, Tianle Li, Connor Chen, Wei-Lin Chiang, Anastasios Nikolas Angelopoulos, Jiantao Jiao, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. How to evaluate reward models for RLHF. CoRR, abs/2410.14872, 2024b. | ||
1654 | 34 | 29 | Evan Frick, Tianle Li, Connor Chen, Wei-Lin Chiang, Anastasios Nikolas Angelopoulos, Jiantao Jiao, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. How to evaluate reward models for RLHF. CoRR, abs/2410.14872, 2024b. | ||
1959 | 35 | 232 | Eyal Mazuz, Guy Shtar, Bracha Shapira, and Lior Rokach. 2023. Molecule generation using transformers and policy gradient reinforcement learning. Scientific Reports 13, 1 (2023), 8799. | ||
1126 | 24 | 9 | F. Cunningham, J. E. Allen, J. Allen, J. Alvarez-Jarreta, M. R. Amode, I. M. Armean, O. Austine-Orimoloye, A. G. Azov, I. Barnes, R. Bennett, et al. Ensembl 2022. Nucleic acids research, 50(D1):D988–D995, 2022. | ||
1054 | 23 | 0 | F. I. Marin, F. Teufel, M. Horlacher, D. Madsen, D. Pultz, O. Winther, and W. Boomsma. Bend: Benchmarking dna language models on biologically meaningful tasks. 12th International Conference on Learning Representations, ICLR 2024, 11 2023. URL https://arxiv.org/pdf/2311.12570. | ||
229 | 5 | 42 | F. Jacob and J. Monod. Genetic regulatory mechanisms in the synthesis of proteins. Journal of Molecular Biology, 3(3):318–356, 1961. | ||
255 | 5 | 68 | F. Meyer, D. Paarmann, M. D’Souza, R. Olson, E. M. Glass, M. Kubal, T. Paczian, A. Rodriguez, R. Stevens, A. Wilke, J. Wilkening, and R. A. Edwards. The metagenomics RAST server - a public resource for the automatic phylogenetic and function alanalysis of meta genomes. BMC Bioinformatics, 9(1):386, Sept.2008. | ||
1388 | 28 | 40 | F. Shi, M. Suzgun, M. Freitag, X. Wang, S. Srivats, S. Vosoughi, H. W. Chung, Y. Tay, S. Ruder, D. Zhou, D. Das, and J. Wei. Language models are multilingual chain-of-thought reasoners. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net, 2023. URL https://openreview.net/pdf?id=fR3wGCk-IXp. | ||
1389 | 28 | 41 | F. Song, B. Yu, M. Li, H. Yu, F. Huang, Y. Li, and H. Wang. Preference ranking optimization for human alignment. arXiv preprint arXiv:2306.17492, 2023. | ||
1170 | 24 | 53 | F. Yang, W. Wang, F. Wang, Y. Fang, D. Tang, J. Huang, H. Lu, and J. Yao. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nature Machine Intelligence, 4(10):852–866, 2022. | ||
270 | 5 | 85 | F.Pedregosa, G.Varoquaux, A.Gramfort, V.Michel, B.Thirion, O.Grisel, M.Blondel, P.Prettenhofer, R.Weiss, V. Dubourg, et al. Scikit-learn: Machine learning in python. The Journal of Machine Learning Research, 12:2825–2830, 2011. | ||
1285 | 26 | 48 | Fajri Koto, Nurul Aisyah, Haonan Li, and Timothy Baldwin. Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. In EMNLP, pp. 12359–12374. Association for Computational Linguistics, 2023. | ||
1580 | 33 | 36 | Fajri Koto, Nurul Aisyah, Haonan Li, and Timothy Baldwin. Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. In EMNLP, pp. 12359–12374. Association for Computational Linguistics, 2023. | ||
1672 | 34 | 47 | Fajri Koto, Nurul Aisyah, Haonan Li, and Timothy Baldwin. Large language models only pass primary school exams in Indonesia: A comprehensive test on IndoMMLU. In EMNLP, pp. 12359–12374. Association for Computational Linguistics, 2023. | ||
644 | 12 | 61 | Fallahpour, A. et al. CodonTransformer: a multispecies codon optimizer using context-aware neural networks. Adibvafa/CodonTransformer. https://doi.org/10.5281/ZENODO.15000833 (Zenodo, 2025). | ||
636 | 12 | 53 | Fallahpour, A., Alinoori, M., Afkanpour, A. & Krishnan, A. EHRMamba: towards generalizable and scalable foundation models for Electronic Health Records. https://doi.org/10.48550/ARXIV.2405.14567 (2024). | ||
662 | 13 | 17 | Fallahpour, A., Alinoori, M., Ye, W., Cao, X., Afkanpour, A., and Krishnan, A. Ehrmamba: Towards generalizable and scalable foundation models for electronic health records, 2024. | ||
2105 | 35 | 378 | Fan Yang, Wenchuan Wang, Fang Wang, Yuan Fang, Duyu Tang, Junzhou Huang, Hui Lu, and Jianhua Yao. 2022. scBERT as a large-scale pretrained deep language model for cell type annotation of single-cell RNA-seq data. Nature Machine Intelligence 4, 10 (2022), 852–866. | ||
2085 | 35 | 358 | Fang Wu, Dragomir Radev, and Stan Z Li. 2023. Molformer: Motif-based transformer on 3d heterogeneous molecular graphs. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 5312–5320. | ||
1251 | 26 | 14 | Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q. Feldman, Arjun Guha, Michael Greenberg, and Abhinav Jangda. MultiPL-E: A scalable and polyglot approach to benchmarking neural code generation. IEEE Trans. Software Eng., 49(7):3675–3691, 2023. | ||
1556 | 33 | 12 | Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q. Feldman, Arjun Guha, Michael Greenberg, and Abhinav Jangda. MultiPL-E: A scalable and polyglot approach to benchmarking neural code generation. IEEE Trans. Software Eng., 49(7):3675–3691, 2023. | ||
1639 | 34 | 14 | Federico Cassano, John Gouwar, Daniel Nguyen, Sydney Nguyen, Luna Phipps-Costin, Donald Pinckney, Ming-Ho Yee, Yangtian Zi, Carolyn Jane Anderson, Molly Q. Feldman, Arjun Guha, Michael Greenberg, and Abhinav Jangda. MultiPL-E: A scalable and polyglot approach to benchmarking neural code generation. IEEE Trans. Software Eng., 49(7):3675–3691, 2023. | ||
1109 | 23 | 55 | Felix Teufel, Magn´ us Halld´ or G´ ıslason, Jos´ e Juan Almagro Armenteros, Alexander Rosenberg Johansen, Ole Winther, and Henrik Nielsen. GraphPart: homology partitioning for biological sequence analysis. NAR Genomics and Bioinformatics, 5(4):lqad088, October 2023. ISSN 2631-9268. doi: 10.1093/nargab/lqad088. URL https://academic.oup.com/nargab/article/doi/10.1093/nargab/lqad088/7318077. | ||
1956 | 35 | 229 | Fergal J Martin, M Ridwan Amode, Alisha Aneja, Olanrewaju Austine-Orimoloye, Andrey G Azov, If Barnes, Arne Becker, Ruth Bennett, Andrew Berry, Jyothish Bhai, et al. 2023. Ensembl 2023. Nucleic acids research 51, D1 (2023), D933–D941. | ||
953 | 22 | 36 | Ferruz N, Schmidt S, Höcker B. ProtGPT2 is a deep unsupervised language model for protein design. Nat Commun 2022;13:4348. 10.1038/s41467-022-32007-7. | ||
598 | 12 | 15 | Ferruz, N., Schmidt, S. & Höcker, B. ProtGPT2 is a deep unsupervised language model for protein design. Nat. Commun. 13, 4348 (2022). | ||
1015 | 22 | 98 | Feynman R. The Character of Physical Law, with New Foreword. MIT Press, Cambridge, Massachusetts, USA, 2017, 10.7551/mitpress/11068.001.0001. | ||
804 | 16 | 5 | Fielding RT. Representational state transfer. Architectural Styles and the Design of Network-Based Software Architectures. Doctoral dissertation. University of California Irvine, Irvine, CA, USA; 2000. | ||
121 | 3 | 113 | Findlay, G. M., Daza, R. M., Martin, B., Zhang, M. D., Leith, A. P., Gasperini, M., Janizek, J. D., Huang, X., Starita, L. M., and Shendure, J. (2018). Accurate classification of BRCA1 variants with saturation genome editing. Nature 562, 217–222. | ||
997 | 22 | 80 | Fiorini N, Leaman R, Lipman DJ. et al. How user intelligence is improving PubMed. Nat Biotechnol 2018;36:937–45. 10.1038/nbt.4267. | ||
180 | 4 | 40 | Fishman, V. et al. GENA-LM: a family of open-source foundational models for long DNA sequences. Preprint at bioRxiv https://doi.org/10.1101/2023.06.12.544594 (2023). | ||
446 | 9 | 24 | Fishman, V. et al. Gena-lm: A family of open-source foundational models for long dna sequences. Preprint at bioRxiv https://doi.org/10.1101/2023.06.12.544594 (2023). | ||
506 | 10 | 24 | Fishman, V. et al. Gena-lm: A family of open-source foundational models for long dna sequences. Preprint at bioRxiv https://doi.org/10.1101/2023.06.12.544594 (2023). | ||
75 | 3 | 67 | Fishman, V., Kuratov, Y., Petrov, M., Shmelev, A., Shepelin, D., Chekanov, N., Kardymon, O., and Burtsev, M. (2023). GENA-LM: A Family of Open-Source Foundational Models for Long DNA Sequences. bioRxiv preprint. https://www.biorxiv.org/content/early/2023/06/13/2023.06.12.544594. doi:10.1101/2023.06.12.544594. | ||
1804 | 35 | 77 | FLIP: Benchmark tasks in fitness landscape inference for proteins. bioRxiv (2021), 2021–11. | ||
1504 | 32 | 11 | Fontenot, R., Kathad, U., McDermott, J., Sturtevant, D., Sharma, P. & Carr, P. Predicting a Compounds Blood-Brain-Barrier Permeability with Lantern Pharma’s AI and ML Platform, RADR 2023. | ||
1421 | 30 | 4 | Forbes S.A., Beare D., Boutselakis H., Bamford S., Bindal N., Tate J., Cole C.G., Ward S., Dawson E., Ponting L.et al. . COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 2017; 45:D777–D783. | ||
925 | 22 | 8 | Forster DT, Li SC, Yashiroda Y. et al. BIONIC: biological network integration using convolutions. Nat Methods 2022;19:1250–61. 10.1038/s41592-022-01616-x. | ||
119 | 3 | 111 | Fowler, D. M., Adams, D. J., Gloyn, A. L., Hahn, W. C., Marks, D. S., Muffley, L. A., Neal, J. T., Roth, F. P., Rubin, A. F., Starita, L. M., and Hurles, M. E. (2023). An Atlas of Variant Effects to understand the genome at nucleotide resolution. Genome Biology 24, 147. | ||
1998 | 35 | 271 | Frank P Ramsey. 1923. Tractatus Logico-Philosophicus. | ||
1432 | 30 | 15 | Frankish A., Diekhans M., Jungreis I., Lagarde J., Loveland J.E., Mudge J.M., Sisu C., Wright J.C., Armstrong J., Barnes I.et al. . Gencode 2021. Nucleic Acids Res. 2021; 49:D916–D923. | ||
19 | 3 | 11 | Frazer, J., Notin, P., Dias, M., Gomez, A., Min, J. K., Brock, K., Gal, Y., and Marks, D. S. (2021). Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95. | ||
1498 | 32 | 5 | Fritsch, C., Huang, A., Chatenay-Rivauday, C., Schnell, C., Reddy, A., Liu, M., Kauffmann, A., Guthy, D., Erdmann, D., De Pover, A., et al. Characterization of the novel and specific PI3Kα inhibitor NVP-BYL719 and development of the patient stratification strategy for clinical trials. Molecular cancer therapeutics 13, 1117–1129 (2014). | ||
1020 | 22 | 103 | Fu L, Cao Y, Wu J. et al. UFold: fast and accurate RNA secondary structure prediction with deep learning. Nucleic Acids Res 2022;50:e14–4. 10.1093/nar/gkab1074. | ||
606 | 12 | 23 | Fu, H. et al. Codon optimization with deep learning to enhance protein expression. Sci. Rep. 10, 17617 (2020). | ||
1526 | 32 | 33 | Fu, T., Huang, K., Xiao, C., Glass, L. M. & Sun, J. Hint: Hierarchical interaction network for clinical-trial-outcome predictions. Patterns 3 (2022). | ||
853 | 18 | 4 | Fujibuchi,W.,Goto,S.,Migimatsu,H.,Uchiyama,I.,Ogiwara,A., Akiyama,Y.and Kanehisa,M.(1998) DBGET/LinkDB: an integrated database retrieval system.Pac. Symp. Biocomput., 683–694. | ||
140 | 4 | 0 | G. Benegas, C. Albors, A. J. Aw, C. Ye, and Y. S. Song. A dna language model based on multispecies alignment predicts the effects of genome-wide variants. Nature Biotechnology, pages 1–6, 1 2025. ISSN 15461696. doi: 10.1038/S41587-024-02511-W;SUBJMETA=114, 1305,208,631;KWRD=GENETICS,MACHINE+LEARNING. URL https://www.nature.com/articles/s41587-024-02511-w. https://www.researchgate.net/publication/387673319_A_DNA_language_model_based_on_multispecies_alignment_predicts_the_effects_of_genome-wide_variants | https://qiita.com/kaizen_nagoya/items/6e8858c2395dcc98804a | |
193 | 5 | 7 | G. Benegas, C. Albors, A. J. Aw, C. Ye, and Y. S. Song. A dna language model based on multispecies alignment predicts the effects of genome-wide variants. Nature Biotechnology, pages 1–6, 2025. | ||
8 | 3 | 0 | G. Benegas, C. Ye, C. Albors, J. C. Li, and Y. S. Song. Genomic language models: Opportunities and challenges. ArXiv, page arXiv:2407.11435v2, 9 2024. ISSN 2331-8422. URL https://pmc.ncbi.nlm.nih.gov/articles/PMC11275703/http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC11275703. | https://pubmed.ncbi.nlm.nih.gov/39753409/ https://qiita.com/kaizen_nagoya/items/f797330e64e0c7d05f39 | |
192 | 5 | 6 | G. Benegas, S. S. Batra, and Y. S. Song. DNA language models are powerful predictors of genome-wide variant effects. Proceedings of the National Academy of Sciences, 120(44):e2311219120, 2023. doi: 10.1073/pnas.2311219120. URL https://www.pnas.org/doi/abs/10.1073/pnas.2311219120. | ||
1120 | 24 | 3 | G. Benegas, S. S. Batra, and Y. S. Song. DNA language models are powerful zero-shot predictors of noncoding variant effects. bioRxiv, pages 2022–08, 2022. | ||
186 | 5 | 0 | G. Brixi, M. G. Durrant, J. Ku, M. Poli, G. Brockman, D. Chang, G. A. Gonzalez, S. H. King, D. B. Li, A. T. Merchant, M. Naghipourfar, E. Nguyen, C. Ricci-Tam, D. W. Romero, G. Sun, A. Taghibakshi, A. Vorontsov, B. Yang, M. Deng, L. Gorton, N. Nguyen, N. K. Wang, E. Adams, S. A. Baccus, S. Dillmann, S. Ermon, D. Guo, R. Ilango, K. Janik, A. X. Lu, R. Mehta, M. R.Mofrad, M. Y. Ng, J. Pannu, C. Ré, J. C. Schmok, J. S. John, J. Sullivan, K. Zhu, G. Zynda, D. Balsam, P. Collison, A. B. Costa, T. Hernandez-Boussard, E. Ho, M.-Y. Liu, T. McGrath, K. Powell, D. P. Burke, H. Goodarzi, P. D. Hsu, and B. L. Hie. Genome modeling and design across all domains of life with evo 2. bioRxiv, 2025. doi: 10.1101/2025.02.18.638918. URL https://www.biorxiv.org/content/early/2025/02/21/2025.02.18.638918. | https://qiita.com/kaizen_nagoya/items/eecda74f758008633ee2 | |
208 | 5 | 21 | G. Csárdi, T. Nepusz, K. Müller, S. Horvát, V. Traag, F. Zanini, and D. Noom. igraph for R: R interface of the igraph library for graph theory and network analysis, Dec. 2024. | ||
216 | 5 | 29 | G. M. Findlay, R. M. Daza, B. Martin, M. D. Zhang, A. P. Leith, M. Gasperini, J. D. Janizek, X. Huang, L. M. Starita, and J. Shendure. Accurate classification of BRCA1 variants with saturation genome editing. Nature, 562:217–222, 10 2018. ISSN 0028-0836. doi: 10.1038/s41586-018-0461-z. | ||
247 | 5 | 60 | G. Marçais and C. Kingsford. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics, 27(6):764–770, 2011. | ||
251 | 5 | 64 | G. Mendel. Versuche über Pflanzen-Hybriden. Verhandlungen des naturforschenden Vereines in Brünn, 4:3–47, 1866. | ||
253 | 5 | 66 | G. Meng, Y. Li, C. Yang, and S. Liu. MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization. Nucleic Acids Research, 47(11):e63–e63, 2019. | ||
1262 | 26 | 25 | Gabriel Synnaeve, Gabrielle Lee, Georgia Lewis Anderson, Graeme Nail, Gr´ egoire Mialon, Guan Pang, Guillem Cucurell, Hailey Nguyen, Hannah Korevaar, Hu Xu, Hugo Touvron, Iliyan Zarov, Imanol Arrieta Ibarra, Isabel M. Kloumann, Ishan Misra, Ivan Evtimov, Jade Copet, Jaewon Lee, Jan Geffert, Jana Vranes, Jason Park, Jay Mahadeokar, Jeet Shah, Jelmer van der Linde, Jennifer Billock, Jenny Hong, Jenya Lee, Jeremy Fu, Jianfeng Chi, Jianyu Huang, Jiawen Liu, Jie Wang, Jiecao Yu, Joanna Bitton, Joe Spisak, Jongsoo Park, Joseph Rocca, Joshua Johnstun, Joshua Saxe, Junteng Jia, | ||
2192 | 36 | 32 | Gabrielle D Smith, Wan Hern Ching, Paola Cornejo-Páramo, and Emily S Wong. Decoding enhancer complexity with machine learning and high-throughput discovery. Genome Biol., 24(1):116, May 2023. | ||
316 | 7 | 4 | Ganna, A. et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Am. J. Hum. Genet. 102, 1204–1211 (2018). | ||
822 | 17 | 8 | Gao L, Madaan A, Zhou S et al. Pal: program-aided language models. arXiv, arXiv:2211.10435, 2022, preprint: not peer reviewed. | ||
171 | 4 | 31 | Gao, H. et al. The landscape of tolerated genetic variation in humans and primates. Science 380, eabn8153 (2023). | ||
90 | 3 | 82 | Gao, L., Biderman, S., Black, S., Golding, L., Hoppe, T., Foster, C., Phang, J., He, H., Thite, A., Nabeshima, N., Presser, S., and Leahy, C. (2020). The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv preprint arXiv:2101.00027. https://arxiv.org/abs/2101.00027. | ||
58 | 3 | 50 | Garau-Luis, J. J., Bordes, P., Gonzalez, L., Roller, M., de Almeida, B. P., Hexemer, L., Blum, C., Laurent, S., Grzegorzewski, J., Lang, M. et al. (2024). Multi-modal transfer learning between biological foundation models. arXiv preprint arXiv:2406.14150. | ||
2028 | 35 | 301 | Garrett A Soukup. 2001. Nucleic acids: General properties. e LS (2001). | ||
1783 | 35 | 56 | Gayane Chilingaryan, Hovhannes Tamoyan, Ani Tevosyan, Nelly Babayan, Lusine Khondkaryan, Karen Hambardzumyan, Zaven Navoyan, Hrant Khachatrian, and Armen Aghajanyan. 2022. Bartsmiles: Generative masked language models for molecular representations. arXiv preprint arXiv:2211.16349 (2022). | ||
2043 | 35 | 316 | Gemini Team. 2023. Gemini: A Family of Highly Capable Multimodal Models. arXiv:2312.11805 [cs.CL] Igor V Tetko, Pavel Karpov, Ruud Van Deursen, and Guillaume Godin. 2020. State-of-the-art augmented NLP transformer models for direct and single-step retrosynthesis. Nature communications 11, (2020), 5575. | ||
1269 | 26 | 32 | Gemini Team. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. Technical report, Google, 2024. URL https://storage.googleapis.com/deepmind-media/gemini/gemini v1 5 report.pdf. | ||
1656 | 34 | 31 | Gemini Team. Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. Technical report, Google, 2024. URL https://storage.googleapis.com/deepmind-media/gemini/gemini v1 5 report.pdf. | ||
1270 | 26 | 33 | Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, L´ eonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ram´ e, et al. Gemma 2: Improving open language models at a practical size. CoRR, abs/2408.00118, 2024. | ||
1657 | 34 | 32 | Gemma Team, Morgane Riviere, Shreya Pathak, Pier Giuseppe Sessa, Cassidy Hardin, Surya Bhupatiraju, L´ eonard Hussenot, Thomas Mesnard, Bobak Shahriari, Alexandre Ram´ e, et al. Gemma 2: Improving open language models at a practical size. CoRR, abs/2408.00118, 2024. | ||
2150 | 35 | 423 | Gengmo Zhou, Zhifeng Gao, Qiankun Ding, Hang Zheng, Hongteng Xu, Zhewei Wei, Linfeng Zhang, and Guolin Ke. 2023. Uni-Mol: a universal 3D molecular representation learning framework. (2023). | ||
714 | 14 | 13 | Genome Reference Consortium. Genome reference consortium human build 38 (grch38). National Center for Biotechnology Information, 2013. URL https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/ | ||
1139 | 24 | 22 | Genome Reference Consortium. Genome reference consortium human build 38 (grch38). National Center for Biotechnology Information, 2013. URL https://www.ncbi.nlm.nih.gov/assembly/GCF_000001405.26/. | ||
1792 | 35 | 65 | Genomes Project Consortium et al. 2015. A global reference for human genetic variation. Nature 526, 7571 (2015), 68. | ||
1965 | 35 | 238 | Geraldene Munsamy, Sebastian Lindner, Philipp Lorenz, and Noelia Ferruz. 2022. ZymCTRL: a conditional language model for the controllable generation of artificial enzymes. In Machine Learning for Structural Biology Workshop. NeurIPS 2022. | ||
884 | 21 | 6 | Gerner M. et al. (2010) Linnaeus: a species name identification system for biomedical literature. BMC Bioinformatics, 11, 85. | ||
709 | 14 | 8 | Gershman, A. et al. Epigenetic patterns in a complete human genome. Science 376, eabj5089 (2022). | ||
1514 | 32 | 21 | Gfeller, D., Schmidt, J., Croce, G., Guillaume, P., Bobisse, S., Genolet, R., Queiroz, L., Cesbron, J., Racle, J. & Harari, A. Improved predictions of antigen presentation and TCR recognition with MixMHCpred2. 2 and PRIME2. 0 reveal potent SARS-CoV-2 CD8+ T-cell epitopes. Cell Systems 14, 72–83 (2023). | ||
813 | 16 | 14 | Giampieri E. keggrest. Computer software. PyPi; 2013. | ||
1093 | 23 | 39 | Gil A. McVean, David M. Altshuler (Co-Chair), Richard M. Durbin (Co-Chair), Gonc¸alo R. Abecasis, David R. Bentley, Aravinda Chakravarti, Andrew G. Clark, Peter Donnelly, Evan E. Eichler, Paul Flicek, Stacey B. Gabriel, Richard A. Gibbs, Eric D. Green, Matthew E. Hurles, Bartha M. Knoppers, Jan O. Korbel, Eric S. Lander, Charles Lee, Hans Lehrach, Elaine R. Mardis, Gabor T. Marth, Gil A. McVean, Deborah A. Nickerson, Jeanette P. Schmidt, Stephen T. Sherry, Jun Wang, Richard K. Wilson, Richard A. Gibbs (Principal Investigator), Huyen Dinh, Christie Kovar, Sandra Lee, Lora Lewis, Donna Muzny, Jeff Reid, Min Wang, Jun Wang (Principal Investigator), Xiaodong Fang, Xiaosen Guo, Min Jian, Hui Jiang, Xin Jin, Guoqing Li, Jingxiang Li, Yingrui Li, Zhuo Li, Xiao Liu, Yao Lu, Xuedi Ma, Zhe Su, Shuaishuai Tai, Meifang Tang, Bo Wang, Guangbiao Wang, Honglong Wu, Renhua Wu, Ye Yin, Wenwei Zhang, Jiao Zhao, Meiru Zhao, Xiaole Zheng, Yan Zhou, Eric S. Lander (Principal Investigator), David M. Altshuler, Stacey B. Gabriel (Co-Chair), Namrata Gupta, Paul Flicek (Principal Investigator), Laura Clarke, Rasko Leinonen, Richard E. Smith, Xiangqun Zheng-Bradley, David R. Bentley (Principal Investigator), Published as a conference paper at ICLR 2024 | ||
724 | 14 | 23 | Gillioz, A., Casas, J., Mugellini, E. & Khaled, O. A. Overview of the Transformer-based Models for NLP Tasks. in Annals of Computer Science and Information Systems vol. 21 179–183 (2020). | ||
885 | 21 | 7 | Giorgi J.M. , Bader G.D. (2018) Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics, 34, 4087. | ||
642 | 12 | 59 | Giorgino, T. Computing and visualizing dynamic time warping alignments in R: The dtw Package. J. Stat. Softw. 31, 1–24 (2009). | ||
1025 | 22 | 108 | Gligorijević V, Renfrew PD, Kosciolek T. et al. Structure-based protein function prediction using graph convolutional networks. Nat Commun 2021;12:3168. 10.1038/s41467-021-23303-9. | ||
857 | 18 | 8 | Goad,W.B.and Kanehisa,M.I.(1982) Pattern recognition in nucleic acid sequences.I.A general method for finding local homologies and symmetries.Nucleic Acids Res.,10,247–263. | ||
2053 | 35 | 326 | Gökçe Uludoğan, Elif Ozkirimli, Kutlu O. Ulgen, Nilgün Karalı, and Arzucan Özgür. 2022. Exploiting Pretrained Biochemical Language Models for Targeted Drug Design. arXiv:2209.00981 [cs.LG] | ||
141 | 4 | 1 | Goldfeder, R. L., Wall, D. P., Khoury, M. J., Ioannidis, J. P. & Ashley, E. A. Human genome sequencing at the population scale: a primer on high-throughput DNA sequencing and analysis. Am. J. Epidemiol. 186, 1000–1009 (2017). | ||
383 | 7 | 71 | Goldmann, J. M. et al. Germline de novo mutation clusters arise during oocyte aging in genomic regions with high double-strand-break incidence. Nat. Genet. 50, 487–492 (2018). | ||
1747 | 35 | 20 | Gonzalo Benegas, Carlos Albors, Alan J Aw, Chengzhong Ye, and Yun S Song. 2023. GPN-MSA: an alignment-based DNA language model for genome-wide variant effect prediction. bioRxiv (2023), 2023–10. | ||
1748 | 35 | 21 | Gonzalo Benegas, Sanjit Singh Batra, and Yun S Song. 2022. DNA language models are powerful zero-shot predictors of genome-wide variant effects. bioRxiv (2022), 2022–08. | ||
1749 | 35 | 22 | Gonzalo Benegas, Sanjit Singh Batra, and Yun S Song. 2023. DNA language models are powerful predictors of genome-wide variant effects. Proceedings of the National Academy of Sciences 120, 44 (2023), e2311219120. | ||
1062 | 23 | 8 | Gonzalo Benegas, Sanjit Singh Batra, and Yun S. Song. DNA language models are powerful zero-shot predictors of genome-wide variant effects. bioRxiv, pp. 2022.08.22.504706, January 2023. doi: 10.1101/2022.08.22.504706. URL http://biorxiv.org/content/early/2023/04/12/2022.08.22.504706.abstract. | ||
552 | 11 | 10 | Google. Our next-generation model: Gemini 1.5, 2024. URL https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024. | ||
643 | 12 | 60 | Górska, A., Plucinska, M., Pedersen, L., Kielpinski, L., Tehler, D. & Hagedorn, P. XNAString: efficient manipulation of modified oligonucleotide sequences. R package version 1.14.0. https://doi.org/10.18129/B9.BIOC.XNASTRING. (2024). | ||
138 | 3 | 130 | Greˇsov´a, K., Martinek, V., Cech´ak, D., Simeˇcek, P., and Alexiou, P. (2023). Genomic bench-marks: a collection of datasets for genomic sequence classification. BMC Genomic Data 24, Article number: 25. | ||
355 | 7 | 43 | Greenway, S. C. et al. De novo copy number variants identify new genes and loci in isolated sporadic tetralogy of Fallot. Nat. Genet. 41, 931–935 (2009). | ||
794 | 15 | 58 | Greg Yang and Edward J. Hu. Feature Learning in Infinite-Width Neural Networks. arXiv:2011.14522 [cond-mat], May 2021. URL http://arxiv.org/abs/2011.14522.arXiv: 2011.14522. | ||
1578 | 33 | 34 | Gregory Kamradt. Needle in a haystack - pressure testing LLMs, 2023. URL https://github.com/gkamradt/LLMTest_NeedleInAHaystack. | ||
1004 | 22 | 87 | Gresova K, Martinek V, Cechak D. et al. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data, 2023;24:25. | ||
410 | 8 | 11 | Gresova, K., Martinek, V., Cechak, D., Simecek, P. & Alexiou, P. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data 24, 25 (2023). | ||
132 | 3 | 124 | Grimm, D. G., Azencott, C.-A., Aicheler, F., Gieraths, U., MacArthur, D. G., Samocha, K. E., Cooper, D. N., Stenson, P. D., Daly, M. J., Smoller, J. W. et al. (2015). The evaluation of tools used to predict the impact of missense variants is hindered by two types of circularity. Human Mutation 36, 513–523. | ||
723 | 14 | 22 | Grinsztajn, L., Oyallon, E. & Varoquaux, G. Why do tree-based models still outperform deep learning on typical tabular data? Advances in Neural Information Processing Systems 35, 507–520 (2022). | ||
942 | 22 | 25 | Gu Y, Tinn R, Cheng H. et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Trans Comput Healthc 2021;3:1–23. | ||
105 | 3 | 97 | Gu, A., and Dao, T. (2023). Mamba: Linear-time sequence modeling with selective state spaces. arXiv preprint arXiv:2312.00752. https://arxiv.org/abs/2312.00752. | ||
103 | 3 | 95 | Gu, A., Goel, K., and Re, C. Efficiently modeling long sequences with structured state spaces. In: International Conference on Learning Representations (2022):. | ||
2059 | 35 | 332 | Guangyu Wang, Guoxing Yang, Zongxin Du, Longjun Fan, and Xiaohu Li. 2023. ClinicalGPT: Large Language Models Finetuned with Diverse Medical Data and Comprehensive Evaluation. arXiv:2306.09968 [cs.CL] | ||
1566 | 33 | 22 | Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou. How abilities in large language models are affected by supervised fine-tuning data composition. CoRR, abs/2310.05492, 2023. | ||
1259 | 26 | 22 | Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, and Jingren Zhou. Self-play with execution feedback: Improving instruction-following capabilities of large language models. CoRR, abs/2406.13542, 2024. | ||
1567 | 33 | 23 | Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, and Jingren Zhou. Self-play with execution feedback: Improving instruction-following capabilities of large language models. CoRR, abs/2406.13542, 2024. | ||
1647 | 34 | 22 | Guanting Dong, Keming Lu, Chengpeng Li, Tingyu Xia, Bowen Yu, Chang Zhou, and Jingren Zhou. Self-play with execution feedback: Improving instruction-following capabilities of large language models. CoRR, abs/2406.13542, 2024. | ||
2096 | 35 | 369 | Guikun Xu, Yongquan Jiang, PengChuan Lei, Yan Yang, and Jim Chen. [n. d.]. GTMGC: Using Graph Transformer to Predict Molecule’s Ground-State Conformation. In The Twelfth International Conference on Learning Representations. | ||
10 | 3 | 2 | Gulati, A., Qin, J., Chiu, C.-C., Parmar, N., Zhang, Y., Yu, J., Han, W., Wang, S., Zhang, Z., Wu, Y. et al. (2020). Conformer: Convolution-augmented transformer for speech recognition. arXiv preprint arXiv:2005.08100. https://arxiv.org/abs/2005.08100. | ||
1017 | 22 | 100 | Guo Y, Wu J, Ma H. et al. Self-supervised pre-training for protein embeddings using tertiary structures. Proceedings of the AAAI Conference on Artificial Intelligence 2022;36:6801–9. 10.1609/aaai.v36i6.20636. | ||
663 | 13 | 18 | Guo, D., Yang, D., Zhang, H., Song, J., Zhang, R., Xu, R., Zhu, Q., Ma, S., Wang, P., Bi, X., et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025. | ||
376 | 7 | 64 | Guo, M. H. et al. Inferring compound heterozygosity from large-scale exome sequencing data. Nat. Genet. https://doi.org/10.1038/s41588-023-01608-3 (2023). | ||
2093 | 35 | 366 | Guoli Xiong, Zhenxing Wu, Jiacai Yi, Li Fu, Zhijiang Yang, Changyu Hsieh, Mingzhu Yin, Xiangxiang Zeng, Chengkun Wu, Aiping Lu, et al. 2021. ADMETlab 2.0: an integrated online platform for accurate and comprehensive predictions of ADMET properties. Nucleic Acids Research 49, W1 (2021), W5–W14. | ||
1465 | 31 | 16 | Guolin Ke, Di He, and T. Liu. Rethinking positional encoding in language pre-training. ArXiv, abs/2006.15595, 2020. | ||
135 | 3 | 127 | Gupta, A., Lal, A., Gunsalus, L. M., Biancalani, T., and Eraslan, G. (2023). Polygraph: A software framework for the systematic assessment of synthetic regulatory DNA elements. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2023.11.27.568764v2. | ||
327 | 7 | 15 | Gussow, A. B. et al. Orion: Detecting regions of the human non-coding genome that are intolerant to variation using population genetics. PLoS ONE 12, e0181604 (2017). | ||
823 | 17 | 9 | Guu K, Lee K, Tung Z et al. Retrieval augmented language model pre-training. In: International conference on machine learning, p. 3929–3938. PMLR, 2020. | ||
69 | 3 | 61 | Gwak, H.-J., and Rho, M. (2022). ViBE: a hierarchical BERT model to identify eukaryotic viruses using metagenome sequencing data. Briefings in Bioinformatics 23. doi:10.1093/bib/bbac204. Bbac204. | ||
1839 | 35 | 112 | H BIELKA GDR, N Sharon, and EW Australia. 1984. Nomenclature and symbolism for amino acids and peptides. Pure and Applied Chemistry 56 (1984), 595–624. | ||
1234 | 25 | 58 | H. Cai, X. Cai, J. Chang, S. Li, L. Yao, C. Wang, Z. Gao, H. Wang, Y. Li, M. Lin, S. Yang, J. Wang, M. Xu, J. Huang, F. Xi, J. Zhuang, Y. Yin, Y. Li, C. Chen, Z. Cheng, Z. Zhao, L. Zhang, and G. Ke, “Sciassess: Benchmarking llm proficiency in scientific literature analysis,” 2024. | ||
209 | 5 | 22 | H. Cunningham, A. Ewart, L. Riggs, R. Huben, and L. Sharkey. Sparse autoencoders find highly interpretable features in language models. arXiv, 2309.08600, 2023. | ||
422 | 9 | 0 | H. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N. L. Carranza, A. H. Grzywaczewski, F. Oteri, C. Dallago, E. Trop, B. P. de Almeida, H. Sirelkhatim, G. Richard, M. Skwark, K. Beguir, M. Lopez, and T. Pierrot. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nature Methods, 22:287–297, 2 2024. ISSN 15487105. doi: 10.1038/S41592-024-02523-Z;SUBJMETA=114,1305,1647,208,212,631,794; KWRD=GENOMICS,MACHINE+LEARNING,SOFTWARE. URL https://www.nature.com/articles/s41592-024-02523-z. | https://qiita.com/kaizen_nagoya/items/1c147c2b095364f04ef7 | |
1127 | 24 | 10 | H. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N. L. Carranza, A. H. Grzywaczewski, F. Oteri, C. Dallago, E. Trop, H. Sirelkhatim, G. Richard, M. Skwark, K. Beguir, M. Lopez, and T. Pierrot. The Nucleotide Transformer: Building and evaluating robust foundation models for human genomics. bioRxiv, 2023. | ||
482 | 10 | 0 | H. Dalla-Torre, L. Gonzalez, J. Mendoza-Revilla, N. Lopez Carranza, A. H. Grzywaczewski, F. Oteri, C. Dallago, E. Trop, B. P. De Almeida, H. Sirelkhatim, G. Richard, M. Skwark, K. Beguir, M. Lopez, and T. Pierrot. Nucleotide transformer: building and evaluating robust foundation models for human genomics. Nature Methods, 22(2):287–297, Feb. 2025. ISSN 1548-7091, 1548-7105. doi: 10.1038/s41592-024-02523-z. URL https://www.nature.com/articles/s41592-024-02523-z. | ||
701 | 14 | 0 | H. Feng, L. Wu, B. Zhao, C. Huff, J. Zhang, J. Wu, L. Lin, P. Wei, C. Wu, P. W. pwei, and A. Professor. Benchmarking dna foundation models for genomic sequence classification running title: Dna foundation models benchmarking. doi: 10.1101/2024.08.16.608288. URL https://doi.org/10.1101/2024.08.16.608288. | ||
226 | 5 | 39 | H. Huang, C. Hu, J. Na, S. N. Hart, R. D. Gnanaolivu, M. Abozaid, T. Rao, Y. A. Tecleab, T. Pesaran, et al. Functional evaluation and clinical classification of BRCA2 variants. Nature, pages 1–10, 2025. | ||
559 | 11 | 17 | H. Li, Y. Zhang, F. Koto, Y. Yang, H. Zhao, Y. Gong, N. Duan, and T. Baldwin. CMMLU: Measuring massive multitask language understanding in Chinese. arXiv preprint arXiv:2306.09212, 2023. | ||
561 | 11 | 19 | H. Lightman, V. Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe. Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023. | ||
1374 | 28 | 26 | H. Lightman, V. Kosaraju, Y. Burda, H. Edwards, B. Baker, T. Lee, J. Leike, J. Schulman, I. Sutskever, and K. Cobbe. Let’s verify step by step. arXiv preprint arXiv:2305.20050, 2023. | ||
1376 | 28 | 28 | H. Luo, Q. Sun, C. Xu, P. Zhao, J. Lou, C. Tao, X. Geng, Q. Lin, S. Chen, and D. Zhang. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct. arXiv preprint arXiv:2308.09583, 2023. | ||
1215 | 25 | 39 | H. Nori, N. King, S. M. McKinney, D. Carignan, and E. Horvitz, “Capabilities of gpt-4 on medical challenge problems,” arXiv preprint arXiv:2303.13375, 2023. | ||
1216 | 25 | 40 | H. Nori, Y. T. Lee, S. Zhang, D. Carignan, R. Edgar, N. Fusi, N. King, J. Larson, Y. Li, W. Liu, et al., “Can generalist foundation models outcompete special-purpose tuning? case study in medicine,” arXiv preprint arXiv:2311.16452, 2023. | ||
1191 | 25 | 15 | H. Suresh and J. Guttag, “A framework for understanding sources of harm throughout the machine learning life cycle,” in Equity and Access in Algorithms, Mechanisms, and Optimization, EAAMO ’21, ACM, Oct. 2021. | ||
1392 | 28 | 44 | H. Touvron, L. Martin, K. Stone, P. Albert, A. Almahairi, Y. Babaei, N. Bashlykov, S. Batra, P. Bhargava, S. Bhosale, D. Bikel, L. Blecher, C. Canton-Ferrer, M. Chen, G. Cucurull, D. Esiobu, J. Fernandes, J. Fu, W. Fu, B. Fuller, C. Gao, V. Goswami, N. Goyal, A. Hartshorn, S. Hosseini, R. Hou, H. Inan, M. Kardas, V. Kerkez, M. Khabsa, I. Kloumann, A. Korenev, P. S. Koura, M. Lachaux, T. Lavril, J. Lee, D. Liskovich, Y. Lu, Y. Mao, X. Martinet, T. Mihaylov, P. Mishra, I. Molybog, Y. Nie, A. Poulton, J. Reizenstein, R. Rungta, K. Saladi, A. Schelten, R. Silva, E. M. Smith, R. Subramanian, X. E. Tan, B. Tang, R. Taylor, A. Williams, J. X. Kuan, P. Xu, Z. Yan, I. Zarov, Y. Zhang, A. Fan, M. Kambadur, S. Narang, A. Rodriguez, R. Stojnic, S. Edunov, and T. Scialom. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288,2023. doi: 10.48550/arXiv.2307.09288. URL https://doi.org/10.48550/arXiv.2307.09288. | ||
1400 | 28 | 52 | H. Xia, T. Ge, P. Wang, S.-Q. Chen, F. Wei, and Z. Sui. Speculative decoding: Exploiting speculative execution for accelerating seq2seq generation. In H. Bouamor, J. Pino, and K. Bali, editors, Findings of the Association for Computational Linguistics: EMNLP 2023, pages 3909–3925, Singapore, Dec. 2023. Association for Computational Linguistics. doi: 10.18653/v1/2023.findings-emnlp.257. URL https://aclanthology.org/2023.findings-emnlp.257. | ||
1401 | 28 | 53 | H. Xia, Z. Yang, Q. Dong, P. Wang, Y. Li, T. Ge, T. Liu, W. Li, and Z. Sui. Unlocking efficiency in large language model inference: A comprehensive survey of speculative decoding. arXiv preprint arXiv:2401.07851, 2024. | ||
581 | 11 | 39 | H. Xin, Z. Z. Ren, J. Song, Z. Shao, W. Zhao, H. Wang, B. Liu, L. Zhang, X. Lu, Q. Du, W. Gao, Q. Zhu, D. Yang, Z. Gou, Z. F. Wu, F. Luo, and C. Ruan. Deepseek-prover-v1.5: Harnessing proof assistant feedback for reinforcement learning and monte-carlo tree search, 2024. URL https://arxiv.org/abs/2408.08152. | ||
210 | 5 | 23 | H.Dalla-Torre, L.Gonzalez, J.Mendoza-Revilla, N.LopezCarranza, A.H.Grzywaczewski, F.Oteri, C.Dallago, E. Trop, B. P. de Almeida, H. Sirelkhatim, et al. Nucleotide Transformer: Building and evaluating robust foundation models for human genomics. Nature Methods, pages 1–11, 2024. | ||
886 | 21 | 8 | Habibi M. et al. (2017) Deep learning with word embeddings improves biomedical named entity recognition. Bioinformatics, 33, i37–i48. | ||
1729 | 35 | 2 | Hadi Abdine, Michail Chatzianastasis, Costas Bouyioukos, and Michalis Vazirgiannis. 2023. Prot2Text: Multimodal Protein’s Function Generation with GNNs and Transformers. arXiv:2307.14367 [q-bio.QM] | ||
856 | 18 | 7 | Haft,D.H.,Badretdin,A.,Coulouris,G.,DiCuccio,M.,Durkin,A.S., Jovenitti,E.,Li,W.,Mersha,M.,O’Neill,K.R.,Virothaisakun,J.,et al. (2024) RefSeq and the prokaryotic genome annotation pipeline in the age of metagenomes.Nucleic Acids Res.,52,D762–D769. | ||
2141 | 35 | 414 | Haiteng Zhao, Shengchao Liu, Chang Ma, Hannan Xu, Jie Fu, Zhi-Hong Deng, Lingpeng Kong, and Qi Liu. 2023. GIMLET: A Unified Graph-Text Model for Instruction-Based Molecule Zero-Shot Learning. bioRxiv (2023), 2023–05. | ||
335 | 7 | 23 | Halldorsson, B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019). | ||
329 | 7 | 17 | Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022). | ||
6 | 1 | 4 | Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online Mendelian Inheritance in Man (OMIM®), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 2005;33:514–517. doi: 10.1093/nar/gki033. | ||
1908 | 35 | 181 | Han Li, Dan Zhao, and Jianyang Zeng. 2022. KPGT: knowledge-guided pre-training of graph transformer for molecular property prediction. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 857–867. | ||
1442 | 30 | 25 | Hanahan D. Hallmarks of cancer: new dimensions. Cancer Discov. 2022; 12:31–46. | ||
1444 | 30 | 27 | Hanahan D., Weinberg R.A. Hallmarks of cancer: the next generation. Cell. 2011; 144:646–674. | ||
1443 | 30 | 26 | Hanahan D., Weinberg R.A. The hallmarks of cancer. Cell. 2000; 100:57–70. | ||
1517 | 32 | 24 | Haneczok, J. & Delijewski, M. Machine learning enabled identification of potential SARS-CoV-2 3CLpro inhibitors based on fixed molecular fingerprints and Graph-CNN neural representations. Journal of Biomedical Informatics 119, 103821 (2021). | ||
1772 | 35 | 45 | Hanjie Chen, Zhouxiang Fang, Yash Singla, and Mark Dredze. 2024. Benchmarking Large Language Models on Answering and Explaining Challenging Medical Questions. arXiv:2402.18060 [cs.CL] | ||
2097 | 35 | 370 | Hanwen Xu and Sheng Wang. 2022. ProTranslator: zero-shot protein function prediction using textual description. In International Conference on Research in Computational Molecular Biology. Springer, 279–294. | ||
2098 | 35 | 371 | Hanwen Xu, Addie Woicik, Hoifung Poon, Russ B Altman, and Sheng Wang. 2023. Multilingual translation for zero-shot biomedical classification using BioTranslator. Nature Communications 14, 1 (2023), 738. | ||
1942 | 35 | 215 | Hanyu Luo, Cheng Chen, Wenyu Shan, Pingjian Ding, and Lingyun Luo. 2022. iEnhancer-BERT: A novel transfer learning architecture based on DNA-Language model for identifying enhancers and their strength. In International Conference on Intelligent Computing. Springer, 153–165. | ||
1943 | 35 | 216 | Hanyu Luo, Wenyu Shan, Cheng Chen, Pingjian Ding, and Lingyun Luo. 2023. Improving language model of human genome for DNA–protein binding prediction based on task-specific pre-training. Interdisciplinary Sciences: Computational Life Sciences 15, (2023), 32–43. | ||
1033 | 22 | 116 | Hao M, Gong J, Zeng X. et al. Large-scale foundation model on single-cell transcriptomics. Nat Methods 2024;21:1481–1491. | ||
1326 | 26 | 89 | Hao Xiang, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun, Jingren Zhou, and Junyang Lin. Aligning large language models via self-steering optimization. CoRR, abs/2410.17131, 2024. | ||
1712 | 34 | 87 | Hao Xiang, Bowen Yu, Hongyu Lin, Keming Lu, Yaojie Lu, Xianpei Han, Le Sun, Jingren Zhou, and Junyang Lin. Aligning large language models via self-steering optimization. CoRR, abs/2410.17131, 2024. | ||
2060 | 35 | 333 | Haochun Wang, Chi Liu, Nuwa Xi, Zewen Qiang, Sendong Zhao, Bing Qin, and Ting Liu. 2023. HuaTuo: Tuning LLaMA Model with Chinese Medical Knowledge. arXiv:2304.06975 [cs.CL] | ||
1581 | 33 | 37 | Haonan Li, Yixuan Zhang, Fajri Koto, Yifei Yang, Hai Zhao, Yeyun Gong, Nan Duan, and Timothy Baldwin. CMMLU: Measuring massive multitask language understanding in Chinese. CoRR, abs/2306.09212, 2023. | ||
2119 | 35 | 392 | Haoyang Zeng, Matthew D Edwards, Ge Liu, and David K Gifford. 2016. Convolutional neural network architectures for predicting DNA–protein binding. Bioinformatics 32, 12 (2016), i121–i127. | ||
419 | 8 | 20 | Harishbhai Tilala, M. et al. Ethical considerations in the use of artificial intelligence and machine learning in health care: a comprehensive review. Cureus 16, e62443 (2024). | ||
388 | 7 | 76 | Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012). | ||
451 | 9 | 29 | Harrow, J. et al. GENCODE: the reference human genome annotation for the encode project. Genome Res. 22, 1760–1774 (2012). | ||
511 | 10 | 29 | Harrow, J. et al. GENCODE: the reference human genome annotation for the encode project. Genome Res. 22, 1760–1774 (2012). | ||
133 | 3 | 125 | Hartl, D. L., Clark, A. G., and Clark, A. G. Principles of population genetics vol. 116. Sinauer associates Sunderland, MA (1997). | ||
948 | 22 | 31 | Hayes T, Rao R, Akin H. et al. Simulating 500 million years of evolution with a language model. bioRxiv 600583, 2024. | ||
109 | 3 | 101 | Hayes, T., Rao, R., Akin, H., Sofroniew, N. J., Oktay, D., Lin, Z., Verkuil, R., Tran, V. Q., Deaton, J., Wiggert, M. et al. (2024). Simulating 500 million years of evolution with a language model. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2024.07.01.600583v1. | ||
85 | 3 | 77 | He, Y., Fang, P., Shan, Y., Pan, Y., Wei, Y., Chen, Y., Chen, Y., Liu, Y., Zeng, Z., Zhou, Z. et al. (2024). LucaOne: Generalized Biological Foundation Model with Unified Nucleic Acid and Protein Language. bioRxiv preprint ( 2024–05). https://www.biorxiv.org/content/10.1101/2024.05.10.592927v1. | ||
409 | 8 | 10 | Health (US), N. I. of & Study, B. S. C. Understanding Human Genetic Variation. in NIH Curriculum Supplement Series [Internet] (National Institutes of Health, 2007). | ||
139 | 3 | 131 | Helfrich, G. (2024). The harms of terminology: why we should reject so-called “frontier AI”. AI and Ethics ( 1–7). | ||
472 | 9 | 50 | Hendrycks, D. & Gimpel, K. Gaussian error linear units (gelus). Preprint at https://arxiv.org/abs/1606.08415 (2016). | ||
532 | 10 | 50 | Hendrycks, D. & Gimpel, K. Gaussian error linear units (gelus). Preprint at https://arxiv.org/abs/1606.08415 (2016). | ||
1543 | 32 | 50 | Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D. & Steinhardt, J. Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300 (2020). | ||
1763 | 35 | 36 | Hengxing Cai, Xiaochen Cai, Junhan Chang, Sihang Li, Lin Yao, Changxin Wang, Zhifeng Gao, Yongge Li, Mujie Lin, Shuwen Yang, et al. 2024. Sciassess: Benchmarking llm proficiency in scientific literature analysis. arXiv preprint arXiv:2403.01976 (2024). | ||
1764 | 35 | 37 | Hengxing Cai, Xiaochen Cai, Shuwen Yang, Jiankun Wang, Lin Yao, Zhifeng Gao, Junhan Chang, Sihang Li, Mingjun Xu, Changxin Wang, et al. 2024. Uni-SMART: Universal Science Multimodal Analysis and Research Transformer. arXiv preprint arXiv:2403.10301 (2024). | ||
1826 | 35 | 99 | Henri A Favre and Warren H Powell. 2013. Nomenclature of organic chemistry: IUPAC recommendations and preferred names 2013. Royal Society of Chemistry. | ||
2104 | 35 | 377 | Hideki Yamaguchi and Yutaka Saito. 2022. EvoOpt: an MSA-guided, fully unsupervised sequence optimization pipeline for protein design. In Machine Learning for Structural Biology Workshop, NeurIPS. 88 Zhang and Ding, et al. | ||
1365 | 28 | 17 | High-flyer. Hai-llm: 高效且轻量的大模型训练工具, 2023. URL https://www.high-flyer.cn/en/blog/hai-llm. | ||
1048 | 22 | 131 | Hijma P, Heldens S, Sclocco A. et al. Optimization techniques for GPU programming. ACM Comput Surv 2023;55:1–81. 10.1145/3570638. | ||
320 | 7 | 8 | Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. USA 106, 9362–9367 (2009). | ||
1728 | 35 | 1 | Hisham Abdel-Aty and Ian R Gould. 2022. Large-scale distributed training of transformers for chemical fingerprinting. Journal of Chemical Information and Modeling 62, 20 (2022), 4852–4862. | ||
67 | 3 | 59 | Hoarfrost, A., Aptekmann, A., Farfa˜nuk, G., and Bromberg, Y. (2022). Deep learning of a bacterial and archaeal universal language of life enables transfer learning and illuminates microbial dark matter. Nature Communications 13, 2606. | ||
467 | 9 | 45 | Hoffmann, J. et al. Training compute-optimal large language models. in 36th Conference on Neural Information Processing Systems https://proceedings.neurips.cc/paper_files/paper/2022/file/c1e2faff6f588870935f114ebe04a3e5-Paper-Conference.pdf (NeurIPS, 2022). | ||
527 | 10 | 45 | Hoffmann, J. et al. Training compute-optimal large language models. in 36th Conference on Neural Information Processing Systems https://proceedings.neurips.cc/paper_files/paper/2022/file/c1e2faff6f588870935f114ebe04a3e5-Paper-Conference.pdf (NeurIPS, 2022). | ||
864 | 19 | 3 | Holmes JB, Moyer E, Phan L, Maglott D, Kattman B. SPDI: data model for variants and applications at NCBI. Bioinformatics. 2020. https://doi.org/10.1093/bioinformatics/btz856. (PMID 31738401.) | ||
389 | 7 | 77 | Hon, C. C. et al. An atlas of human long non-coding RNAs with accurate 5′ ends. Nature 543, 199–204 (2017). | ||
969 | 22 | 52 | Hong L, Sun S, Zheng L, Tan Q X, and Li Y. fastmsa: Accelerating multiple sequence alignment with dense retrieval on protein language. bioRxiv 2021;2021–12. | ||
1909 | 35 | 182 | Hong-Liang Li, Yi-He Pang, and Bin Liu. 2021. BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models. Nucleic acids research 49, 22 (2021), e129–e129. | ||
2151 | 35 | 424 | Hong-Yu Zhou, Yunxiang Fu, Zhicheng Zhang, Cheng Bian, and Yizhou Yu. 2023. Protein Representation Learning via Knowledge Enhanced Primary Structure Modeling. arXiv e-prints (Jan. 2023), arXiv:2301.13154. arXiv:2301.13154 [cs.LG] | ||
2125 | 35 | 398 | Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Chen, Jianquan Li, Guiming Chen, Xiangbo Wu, Zhiyi Zhang, Qingying Xiao, Xiang Wan, Benyou Wang, and Haizhou Li. 2023. HuatuoGPT, towards Taming Language Model to Be a Doctor. arXiv:2305.15075 [cs.CL] | ||
2094 | 35 | 367 | Honglin Xiong, Sheng Wang, Yitao Zhu, Zihao Zhao, Yuxiao Liu, Linlin Huang, Qian Wang, and Dinggang Shen. 2023. DoctorGLM: Fine-tuning your Chinese Doctor is not a Herculean Task. arXiv:2304.01097 [cs.CL] | ||
2022 | 35 | 295 | Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, and Raghav Mani. 2020. BioMegatron: Larger Biomedical Domain Language Model. arXiv:2010.06060 | ||
824 | 17 | 10 | Hou W, Ji Z. Geneturing tests gpt models in genomics. bioRxiv 2023:2023–03. pages | ||
938 | 22 | 21 | Howard J, Ruder S. Universal language model fine-tuning for text classification. arXiv preprint arXiv:1801.06146, 2018. | ||
26 | 3 | 18 | Hsu, C., Nisonoff, H., Fannjiang, C., and Listgarten, J. (2022). Learning protein fitness models from evolutionary and assay-labeled data. Nature Biotechnology 40, 1114–1122. | ||
36 | 3 | 28 | Hsu, C., Verkuil, R., Liu, J., Lin, Z., Hie, B., Sercu, T., Lerer, A., and Rives, A. Learning inverse folding from millions of predicted structures. In: International Conference on Machine Learning. PMLR (2022):( 8946–8970). | ||
1346 | 27 | 5 | https://api.together.xyz/signin?redirectUrl=%2Fplayground%2Ftogethercomputer%2FStripedHyena-Hessian-7B | ||
1347 | 27 | 6 | https://github.com/HazyResearch/flash-fft-conv | ||
1342 | 27 | 1 | https://huggingface.co/togethercomputer/StripedHyena-Hessian-7B | ||
716 | 676 | 14 | 15 | Hu, E. J. et al. LoRA: Low-Rank Adaptation of Large Language Models. Preprint at https://doi.org/10.48550/arXiv.2106.09685 (2021). | |
1507 | 32 | 14 | Hu, W., Liu, B., Gomes, J., Zitnik, M., Liang, P., Pande, V. & Leskovec, J. Strategies for pre-training graph neural networks. arXiv preprint arXiv:1905.12265 (2019). | ||
1502 | 32 | 9 | Hu, X., Xia, M., Wang, J., Yu, H., Chai, J., Zhang, Z., Sun, Y., Su, J. & Sun, L. Dual PI3K/mTOR inhibitor PKI-402 suppresses the growth of ovarian cancer cells by degradation of Mcl-1 through autophagy. Biomedicine & Pharmacotherapy 129, 110397 (2020). | ||
1511 | 32 | 18 | Huang, D., Chowdhuri, S. R., Li, A., Li, A., Agrawal, A., Gano, K. & Zhu, A. A Unified System for Molecular Property Predictions: Oloren ChemEngine and its Applications (2022). | ||
664 | 13 | 19 | Huang, J., Neill, L., Wittbrodt, M., Melnick, D., Klug, M., Thompson, M., Bailitz, J., Loftus, T., Malik, S., Phull, A., et al. Generative artificial intelligence for chest radiograph interpretation in the emergency department. JAMA network open, 6(10):e2336100–e2336100, 2023. | ||
1508 | 32 | 15 | Huang, K., Fu, T., Glass, L. M., Zitnik, M., Xiao, C. & Sun, J. DeepPurpose: a deep learning library for drug–target interaction prediction. Bioinformatics 36, 5545–5547 (2020). | ||
1922 | 35 | 195 | Huaqing Liu, Shuxian Zhou, Peiyi Chen, Jiahui Liu, Ku-Geng Huo, and Lanqing Han. 2024. Exploring Genomic Large Language Models: Bridgin the Gap between Natural Language and Gene Sequences. bioRxiv (2024), 2024–02. | ||
727 | 14 | 26 | Hubert, L. & Arabie, P. Comparing partitions. Journal of Classification 2, 193–218 (1985). | ||
1426 | 30 | 9 | Hudson T.J., Anderson W., Aretz A., Barker A.D., Bell C., Bernabé R.R., Bhan M.K., Calvo F., Eerola I., Gerhard D.S.et al. . International network of cancer genome projects. Nature. 2010; 464:993–998. | ||
918 | 22 | 1 | Hughes JP, Rees S, Kalindjian SB. et al. Principles of early drug discovery. Br J Pharmacol 2011;162:1239–49. 10.1111/j.1476-5381.2010.01127.x. | ||
1066 | 23 | 12 | Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Hassan Sirelkhatim, Guillaume Richard, Marcin Skwark, Karim Beguir, Marie Lopez, and Thomas Pierrot. The Nucleotide Transformer: Building and Evaluating Robust Foundation Models for Human Genomics. bioRxiv, pp. 2023.01.11.523679, January 2023. doi: 10.1101/2023.01.11.523679. URL http://biorxiv.org/content/early/2023/01/15/2023.01.11.523679.abstract. | ||
1802 | 35 | 75 | Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Bernardo P de Almeida, Hassan Sirelkhatim, et al. 2023. The nucleotide transformer: Building and evaluating robust foundation models for human genomics. bioRxiv (2023), 2023–01. | ||
2168 | 36 | 8 | Hugo Dalla-Torre, Liam Gonzalez, Javier Mendoza-Revilla, Nicolas Lopez Carranza, Adam Henryk Grzywaczewski, Francesco Oteri, Christian Dallago, Evan Trop, Hassan Sirelkhatim, Guillaume Richard, Marcin Skwark, Karim Beguir, Marie Lopez, and Thomas Pierrot. The nucleotide transformer: Building and evaluating robust foundation models for human genomics. bioRxiv,2023. doi: 10.1101/2023.01.11.523679. URL https://www.biorxiv.org/content/early/2023/03/09/2023.01.11.523679. | ||
1319 | 26 | 82 | Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton-Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aur´ elien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. | ||
1705 | 34 | 80 | Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, Dan Bikel, Lukas Blecher, Cristian Canton-Ferrer, Moya Chen, Guillem Cucurull, David Esiobu, Jude Fernandes, Jeremy Fu, Wenyin Fu, Brian Fuller, Cynthia Gao, Vedanuj Goswami, Naman Goyal, Anthony Hartshorn, Saghar Hosseini, Rui Hou, Hakan Inan, Marcin Kardas, Viktor Kerkez, Madian Khabsa, Isabel Kloumann, Artem Korenev, Punit Singh Koura, Marie-Anne Lachaux, Thibaut Lavril, Jenya Lee, Diana Liskovich, Yinghai Lu, Yuning Mao, Xavier Martinet, Todor Mihaylov, Pushkar Mishra, Igor Molybog, Yixin Nie, Andrew Poulton, Jeremy Reizenstein, Rashi Rungta, Kalyan Saladi, Alan Schelten, Ruan Silva, Eric Michael Smith, Ranjan Subramanian, Xiaoqing Ellen Tan, Binh Tang, Ross Taylor, Adina Williams, Jian Xiang Kuan, Puxin Xu, Zheng Yan, Iliyan Zarov, Yuchen Zhang, Angela Fan, Melanie Kambadur, Sharan Narang, Aur´ elien Rodriguez, Robert Stojnic, Sergey Edunov, and Thomas Scialom. Llama 2: Open foundation and fine-tuned chat models. CoRR, abs/2307.09288, 2023b. | ||
1318 | 26 | 81 | Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ ee Lacroix, Baptiste Rozi` ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aur´ elien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. LLaMA: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a. | ||
1704 | 34 | 79 | Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ ee Lacroix, Baptiste Rozi` ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aur´ elien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. LLaMA: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023a. | ||
1612 | 33 | 68 | Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timoth´ee Lacroix, Baptiste Rozi` ere, Naman Goyal, Eric Hambro, Faisal Azhar, Aur´ elien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. LLaMA: Open and efficient foundation language models. CoRR, abs/2302.13971, 2023. | ||
2047 | 35 | 320 | Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, Aurélien Rodriguez, Armand Joulin, Edouard Grave, and Guillaume Lample. 2023. LLaMA: Open and Efficient Foundation Language Models. CoRR (2023). | ||
2048 | 35 | 321 | Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971 (2023). | ||
1282 | 26 | 45 | Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention. arXiv preprint arXiv:2407.02490, 2024b. | ||
1669 | 34 | 44 | Huiqiang Jiang, Yucheng Li, Chengruidong Zhang, Qianhui Wu, Xufang Luo, Surin Ahn, Zhenhua Han, Amir H Abdi, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. Minference 1.0: Accelerating pre-filling for long-context llms via dynamic sparse attention. arXiv preprint arXiv:2407.02490, 2024b. | ||
1485 | 31 | 36 | Hussein Al-Natsheh. Udl at semeval-2017 task 1: Semantic textual similarity estimation of english sentence pairs using regression model over pairwise features. 08 2017. | ||
865 | 19 | 4 | Hutchins BI, Baker KL, Davis MT, Diwersy MA, Haque E, Harriman RM, Hoppe TA, Leicht SA, Meyer P, Santangelo GM. The NIH Open Citation Collection: A public access, broad coverage resource. PLoS Biol. 2019. https://doi.org/10.1371/journal.pbio.3000385. (PMID 31600197.) | ||
1343 | 27 | 2 | Hyena Hierarchy: Towards Larger Convolutional Language Models Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, Christopher Ré https://arxiv.org/abs/2302.10866 | ||
665 | 13 | 20 | Hyland, S. L., Bannur, S., Bouzid, K., Castro, D. C., Ranjit, M., Schwaighofer, A., P´ erez-Garc´ ıa, F., Salvatelli, V., Srivastav, S., Thieme, A., Codella, N., Lungren, M. P., Wetscherek, M. T., Oktay, O., and Alvarez-Valle, J. Maira1: A specialised large multimodal model for radiology report generation, 2024. | ||
1889 | 35 | 162 | Hyunjae Kim, Hyeon Hwang, Jiwoo Lee, Sihyeon Park, Dain Kim, Taewhoo Lee, Chanwoong Yoon, Jiwoong Sohn, Donghee Choi, and Jaewoo Kang. 2024. Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks. arXiv:2404.00376 [cs.CL] | ||
1890 | 35 | 163 | Hyunseung Kim, Jonggeol Na, and Won Bo Lee. 2021. Generative chemical transformer: neural machine learning of molecular geometric structures from chemical language via attention. Journal of chemical information and modeling 61, 12 (2021), 5804–5814. | ||
302 | 5 | 117 | I. E. Vorontsov, I. A. Eliseeva, A. Zinkevich, M. Nikonov, S. Abramov, A. Boytsov, V. Kamenets, A. Kasianova, S. Kolmykov, I. S. Yevshin, A. Favorov, Y. A. Medvedeva, A. Jolma, F. Kolpakov, V. J. Makeev, and I. V. Kulakovskiy. HOCOMOCO in 2024: a rebuild of the curated collection of binding models for human and mouse transcription factors. Nucleic Acids Research, 52(D1):D154–D163, November 2023. ISSN 0305-1048. doi: 10.1093/nar/gkad1077. URL https://doi.org/10.1093/nar/gkad1077. | ||
1375 | 28 | 27 | I. Loshchilov and F. Hutter. arXiv:1711.05101, 2017. Decoupled weight decay regularization. arXiv preprint | ||
1214 | 25 | 38 | I. Pentina, B. Guo, and W. P. Fan, “Friend, mentor, lover: Does chatbot engagement lead to psychological dependence?,” Journal of Service Management, 2023. | ||
1202 | 25 | 26 | I. Pentina, T. Hancock, and T. Xie, “Exploring relationship development with social chatbots: A mixed-method study of replika,” Computers in Human Behavior, vol. 140, p. 107600, 2023. | ||
1188 | 25 | 12 | I. Solaiman, Z. Talat, W. Agnew, L. Ahmad, D. Baker, S. L. Blodgett, C. Chen, H. D. I. au2, J. Dodge, I. Duan, E. Evans, F. Friedrich, A. Ghosh, U. Gohar, S. Hooker, Y. Jernite, R. Kalluri, A. Lusoli, A. Leidinger, M. Lin, X. Lin, S. Luccioni, J. Mickel, M. Mitchell, J. Newman, A. Ovalle, M.-T. Png, S. Singh, A. Strait, L. Struppek, and A. Subramonian, “Evaluating the social impact of generative ai systems in systems and society,” 2024. | ||
203 | 5 | 16 | I.-M. A. Chen, K. Chu, K. Palaniappan, A. Ratner, J. Huang, M. Huntemann, P. Hajek, S. Ritter, N. Varghese, R. Seshadri, S. Roux, T. Woyke, E. A. Eloe-Fadrosh, N. N. Ivanova, and N. C. Kyrpides. The IMG/M data management and analysis system v.6.0: new tools and advanced capabilities. Nucleic Acids Res., 49(D1):D751–D763, Jan. 2021. | ||
284 | 5 | 99 | I.Sarropoulos, R.Marin, M.Cardoso-Moreira, andH.Kaessmann. Developmental dynamics of lncRNAs across mammalian organs and species. Nature, 571(7766):510–514, July 2019. | ||
1419 | 30 | 2 | ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium Pan-cancer analysis of whole genomes. Nature. 2020; 578:82–93. | ||
1837 | 35 | 110 | Iker García-Ferrero, Rodrigo Agerri, Aitziber Atutxa Salazar, Elena Cabrio, Iker de la Iglesia, Alberto Lavelli, Bernardo Magnini, Benjamin Molinet, Johana Ramirez-Romero, German Rigau, Jose Maria Villa-Gonzalez, Serena Villata, and Andrea Zaninello. 2024. Medical mT5: An Open-Source Multilingual Text-to-Text LLM for The Medical Domain. arXiv:2404.07613 | ||
772 | 15 | 36 | Ilya Loshchilov and Frank Hutter. arXiv:1711.05101, 2017. | ||
2182 | 36 | 22 | Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. | ||
1481 | 31 | 32 | Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization. arXiv e-prints, art. arXiv:1711.05101,November 2017. | ||
1 | 0 | 0 | Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model https://arxiv.org/abs/2505.23579 | https://qiita.com/kaizen_nagoya/items/0718b214043a614deee0 | |
1366 | 28 | 18 | Inflection AI. Inflection-2, 2023. URL https://inflection.ai/inflection-2. | ||
596 | 12 | 13 | Ingraham, J. B. et al. Illuminating protein space with a programmable generative model. Nature 623, 1070–1078 (2023). | ||
35 | 3 | 27 | Ingraham, J., Garg, V., Barzilay, R., and Jaakkola, T. Generative models for graph-based protein design. In: Wallach, H., Larochelle, H., Beygelzimer, A., d'Alch´e-Buc, F., Fox, E., and Garnett, R., eds. Advances in Neural Information Processing Systems vol. 32. Curran Associates, Inc. (2019):https://proceedings.neurips.cc/paper_files/paper/2019/file/f3a4ff4839c56a5f460c88cce3666a2b-Paper.pdf. | ||
2088 | 35 | 361 | Ioannis Xenarios, Lukasz Salwinski, Xiaoqun Joyce Duan, Patrick Higney, Sul-Min Kim, and David Eisenberg. 2002. DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic acids research 30, 1 (2002), 303–305. | ||
666 | 13 | 21 | Irvin, J., Rajpurkar, P., Ko, M., Yu, Y., Ciurea-Ilcus, S., Chute, C., Marklund, H., Haghgoo, B., Ball, R., Shpanskaya, K., Seekins, J., Mong, D. A., Halabi, S. S., Sandberg, J. K., Jones, R., Larson, D. B., Langlotz, C. P., Patel, B. N., Lungren, M. P., and Ng, A. Y. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison, 2019. | ||
1436 | 30 | 19 | Islam S.M.A., Díaz-Gay M., Wu Y., Barnes M., Vangara R., Bergstrom E.N., He Y., Vella M., Wang J., Teague J.W.et al. . Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics. 2022; 2:https://doi.org/10.1016/j.xgen.2022.100179. | ||
1745 | 35 | 18 | Iz Beltagy, Kyle Lo, and Arman Cohan. 2019. SciBERT: A Pretrained Language Model for Scientific Text. arXiv:1903.10676 [cs.CL] | ||
1201 | 25 | 25 | J. A. Goldstein, G. Sastry, M. Musser, R. DiResta, M. Gentzel, and K. Sedova, “Generative language models and automated influence operations: Emerging threats and potential mitigations,” 2023. | ||
187 | 5 | 1 | J. Abramson, J. Adler, J. Dunger, R. Evans, T. Green, A. Pritzel, O. Ronneberger, L. Willmore, A. J. Ballard, J. Bambrick, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3. Nature,2024. | ||
2 | 1 | 0 | J. Amberger, C. A. Bocchini, A. F. Scott, and A. Hamosh. Mckusick’s online mendelian inheritance in man (omim®). Nucleic Acids Research, 37:D793, 2008. ISSN 03051048. doi: 10.1093/NAR/GKN665. URL https://pmc.ncbi.nlm.nih.gov/articles/PMC2686440/. | https://qiita.com/kaizen_nagoya/items/c599d867201d1ffb1f4d | |
1350 | 28 | 2 | J. Austin, A. Odena, M. Nye, M. Bosma, H. Michalewski, D. Dohan, E. Jiang, C. Cai, M. Terry, Q. Le, et al. Program synthesis with large language models. arXiv preprint arXiv:2108.07732, 2021. | ||
1352 | 28 | 4 | J. Bai, S. Bai, Y. Chu, Z. Cui, K. Dang, X. Deng, Y. Fan, W. Ge, Y. Han, F. Huang, et al. Qwen technical report. arXiv preprint arXiv:2309.16609, 2023. C. Burns, P. Izmailov, J. H. Kirchner, B. Baker, L. Gao, L. Aschenbrenner, Y. Chen, A. Ecoffet, M. Joglekar, J. Leike, et al. Weak-to-strong generalization: Eliciting strong capabilities with weak supervision. arXiv preprint arXiv:2312.09390, 2023. | ||
225 | 5 | 38 | J. C. Hingerl, A. Karollus, and J. Gagneur. Flashzoi: An enhanced Borzoi model for accelerated genomic analysis. bioRxiv, pages 2024–12, 2024. | ||
206 | 5 | 19 | J. Cheng, G. Novati, J. Pan, C. Bycroft, A. Žemgulyt˙ e, T. Applebaum, A. Pritzel, L. H. Wong, M. Zielinski, T. Sargeant, et al. Accurate proteome-wide missense variant effect prediction with alphamissense. Science, 381(6664):eadg7492, 2023. | ||
1130 | 24 | 13 | J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. | ||
1453 | 31 | 4 | J. Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In NAACL-HLT, 2019. | ||
199 | 5 | 13 | J. E. Brownell, J. Zhou, T. Ranalli, R. Kobayashi, D. G. Edmondson, S. Y. Roth, and C. D. Allis. Tetrahymena histone acetyltransferase A: a homolog to yeast Gcn5p linking histone acetylation to gene activation. Cell, 84(6):843–851, 1996. | ||
1077 | 23 | 23 | J. Gorodkin. Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry, 28(5):367–374, December 2004. ISSN 1476-9271. doi: 10.1016/j.compbiolchem.2004.09.006. URL https://www.sciencedirect.com/science/article/pii/S1476927104000799. | ||
861 | 19 | 0 | J. Kans. Entrez direct: E-utilities on the unix command line - entrez programming utilities help - ncbi bookshelf, 4 2013. URL https://www.ncbi.nlm.nih.gov/books/NBK179288/. | ||
232 | 5 | 45 | J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D.Amodei. Scalinglawsforneurallanguagemodels,2020. URLhttps://arxiv.org/abs/2001.08361. | ||
236 | 5 | 49 | J. Ku, E. Nguyen, D. Romero, G. Brixi, B. Yang, A. Vorontsov, A. Taghibakhshi, A. Lu, D. Burke, G. Brockman, S. Massaroli, C. Re, P. Hsu, B. Hie, S. Ermon, and M. Poli. Systems and algorithms for convolutional multi-hybrid language models at scale. 2025. | ||
878 | 21 | 0 | J. Lee, W. Yoon, S. Kim, D. Kim, S. Kim, C. H. So, and J. Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36:1234–1240, 2 2020. ISSN 1367-4803. doi: 10.1093/BIOINFORMATICS/BTZ682. URL https://dx.doi.org/10.1093/bioinformatics/btz682. | ||
1233 | 25 | 57 | J. M. Laurent, J. D. Janizek, M. Ruzo, M. M. Hinks, M. J. Hammerling, S. Narayanan, M. Ponnapati, A. D. White, and S. G. Rodriques, “Lab-bench: Measuring capabilities of language models for biology research,” 2024. | ||
1149 | 24 | 32 | J. Meier, R. Rao, R. Verkuil, J. Liu, T. Sercu, and A. Rives. Language models enable zero-shot prediction of the effects of mutations on protein function. Advances in Neural Information Processing Systems, 34: 29287–29303, 2021. | ||
250 | 5 | 63 | J. Meier, R. Rao, R. Verkuil, J. Liu, T. Sercu, and A. Rives. Language models enable zero-shot prediction of the effects of mutations on protein function. bioRxiv, 2021. doi: 10.1101/2021.07.09.450648. URL https://www.biorxiv.org/content/10.1101/2021.07.09.450648v1. | ||
256 | 5 | 69 | J. Mistry, S. Chuguransky, L. Williams, M. Qureshi, G. A. Salazar, E. L. Sonnhammer, S. C. Tosatto, L. Paladin, S. Raj, L. J. Richardson, et al. Pfam: The protein families database in 2021. Nucleic acids research, 49(D1): D412–D419, 2021 | ||
1150 | 24 | 33 | J. Nasser, D. T. Bergman, C. P. Fulco, P. Guckelberger, B. R. Doughty, T. A. Patwardhan, T. R. Jones, T. H. Nguyen, J. C. Ulirsch, F. Lekschas, K. Mualim, H. M. Natri, E. M. Weeks, G. Munson, M. Kane, H. Y. Kang, A. Cui, J. P. Ray, T. M. Eisenhaure, R. L. Collins, K. Dey, H. Pfister, A. L. Price, C. B. Epstein, A. Kundaje, R. J. Xavier, M. J. Daly, H. Huang, H. K. Finucane, N. Hacohen, E. S. Lander, and J. M. Engreitz. Genome-wide enhancer maps link risk variants to disease genes. Nature, 593:238–243, 2021. | ||
233 | 5 | 46 | J. R. Karr, J. C. Sanghvi, D. N. Macklin, M. V. Gutschow, J. M. Jacobs, B. Bolival, N. Assad-Garcia, J. I. Glass, and M. W. Covert. A whole-cell computational model predicts phenotype from genotype. Cell, 150(2):389–401, 2012. | ||
1160 | 24 | 43 | J. Rogers and R. A. Gibbs. Comparative primate genomics: emerging patterns of genome content and dynamics. Nature Reviews Genetics, 15(5):347–359, 2014. | ||
1387 | 28 | 39 | J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347, 2017. | ||
1386 | 28 | 38 | J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015. | ||
1385 | 28 | 37 | J. Schulman. Approximating kl divergence, 2020. URL http://joschu.net/blog/kl-approx.html. | ||
289 | 5 | 104 | J. Shine and L. Dalgarno. The 3-terminal sequence of escherichia coli 16s ribosomal rna: complementarity to nonsense triplets and ribosome binding sites. Proceedings of the National Academy of Sciences, 71(4): 1342–1346, 1974. | ||
290 | 5 | 105 | J. Söding, A. Biegert, and A. N. Lupas. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Research, 33(suppl_2):W244–W248, 2005. | ||
292 | 5 | 107 | J. Su, M. Ahmed, Y. Lu, S. Pan, W. Bo, and Y. Liu. Roformer: Enhanced transformer with rotary position embedding. Neurocomputing, 568:127063, 2024. | ||
1449 | 31 | 0 | J. Su, Y. Lu, S. Pan, A. Murtadha, B. Wen, and Y. Liu. Roformer: Enhanced transformer with rotary position embedding, 2023. URL https://arxiv.org/abs/2104.09864. | ||
1164 | 24 | 47 | J. T. Smith, A. Warrington, and S. W. Linderman. Simplified state space layers for sequence modeling. arXiv preprint arXiv:2208.04933, 2022. | ||
575 | 11 | 33 | J. Uesato, N. Kushman, R. Kumar, F. Song, N. Siegel, L. Wang, A. Creswell, G. Irving, and I. Higgins. Solving math word problems with process-and outcome-based feedback. arXiv preprint arXiv:2211.14275, 2022. | ||
1227 | 25 | 51 | J. Varghese and J.-L. Chapiro, “Systematic analysis of chatgpt, google search and llama 2 for clinical decision support tasks,” Nature Communications, vol. 15, no. 1, p. 46411, 2024. Accessed: 2024-08-07. | ||
304 | 5 | 119 | J. Watson and F. Crick. Molecular structure of nucleic acids: A structure for deoxyribose nucleic acid. Nature, 171:737–738, 4 1953. ISSN 0028-0836. doi: 10.1038/171737a0. | ||
1169 | 24 | 52 | J. Wei, M. Bosma, V. Y. Zhao, K. Guu, A. W. Yu, B. Lester, N. Du, A. M. Dai, and Q. V. Le. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021. | ||
1397 | 28 | 49 | J. Wei, X. Wang, D. Schuurmans, M. Bosma, B. Ichter, F. Xia, E. H. Chi, Q. V. Le, and D. Zhou. Chain-of-thought prompting elicits reasoning in large language models. In NeurIPS, 2022. URL http://papers.nips.cc/paper_files/paper/2022/hash/9d5609613524ecf4f15af0f7b31abca4-Abstract-Conference.html. | ||
1171 | 24 | 54 | J. Yu, Z. Wang, V. Vasudevan, L. Yeung, M. Seyedhosseini, and Y. Wu. Coca: Contrastive captioners are image-text foundation models. arXiv preprint arXiv:2205.01917, 2022. | ||
1174 | 24 | 57 | J. Zhou and O. G. Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model. Nature methods, 12(10):931–934, 2015. | ||
582 | 11 | 40 | J. Zhou, T. Lu, S. Mishra, S. Brahma, S. Basu, Y. Luan, D. Zhou, and L. Hou. Instruction-following evaluation for large language models. arXiv preprint arXiv:2311.07911, 2023. | ||
204 | 5 | 17 | J.Chen,Z.Hu,S.Sun,Q.Tan,Y.Wang,Q.Yu,L.Zong,L.Hong,J.Xiao,T.Shen,I.King,andY.Li. Interpretable RNA foundation model from unannotated data for highly accurate RNA structure and function predictions 2022. | ||
242 | 5 | 55 | J.Linder, D.Srivastava, H.Yuan, V.Agarwal, andD.R.Kelley. PredictingRNA-seqcoveragefromdnasequence as a unifying model of gene regulation. Nature Genetics, pages 1–13, 2025. | ||
282 | 5 | 97 | J.T.Robinson,H.Thorvaldsdottir,D.Turner,andJ.P.Mesirov. igv.js: anembeddablejavascriptimplementation of the integrative genomics viewer (igv). Bioinformatics, 39(1):btac830, 12 2022. ISSN 1367-4811. doi:10.1093/bioinformatics/btac830. URL https://doi.org/10.1093/bioinformatics/btac830. | ||
1245 | 26 | 8 | Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. Program synthesis with large language models. CoRR, abs/2108.07732, 2021. | ||
1550 | 33 | 6 | Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. Program synthesis with large language models. CoRR, abs/2108.07732, 2021. | ||
1633 | 34 | 8 | Jacob Austin, Augustus Odena, Maxwell I. Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie J. Cai, Michael Terry, Quoc V. Le, and Charles Sutton. Program synthesis with large language models. CoRR, abs/2108.07732, 2021. | ||
2176 | 36 | 16 | Jacob Devlin Ming-Wei Chang Kenton and Lee Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of NAACL-HLT, pp. 4171–4186, 2019. | ||
1807 | 35 | 80 | Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018). | ||
1806 | 35 | 79 | Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In NAACL, Jill Burstein, Christy Doran, and Thamar Solorio (Eds.). Association for Computational Linguistics, 4171–4186. | ||
748 | 15 | 12 | Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding, 2019a. | ||
1067 | 23 | 13 | Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. 2018. doi: 10.48550/ARXIV.1810.04805. URL https://arxiv.org/abs/1810.04805. Publisher: arXiv Version Number: 2. | ||
749 | 15 | 13 | Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. arXiv:1810.04805 [cs], May 2019b. URL http://arxiv.org/abs/1810.04805. arXiv: 1810.04805. | ||
2080 | 35 | 353 | Jacob White. 2020. PubMed 2.0. Medical reference services quarterly 39, 4 (2020), 382–387. | ||
667 | 13 | 22 | Jaech, A., Kalai, A., Lerer, A., Richardson, A., El-Kishky, A., Low, A., Helyar, A., Madry, A., Beutel, A., Carney, A., et al. Openai o1 system card. arXiv preprint arXiv:2412.16720, 2024. | ||
458 | 9 | 36 | Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019). | ||
518 | 10 | 36 | Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548 (2019). | ||
158 | 4 | 18 | Jaganathan, K. et al. Predicting splicing from primary sequence with deep learning. Cell 176, 535–548.e24 (2019). | ||
31 | 3 | 23 | Jaganathan, K., Panagiotopoulou, S. K., McRae, J. F., Darbandi, S. F., Knowles, D., Li, Y. I., Kosmicki, J. A., Arbelaez, J., Cui, W., Schwartz, G. B. et al. (2019). Predicting splicing from primary sequence with deep learning. Cell 176, 535–548. | ||
147 | 4 | 7 | Jagota, M. et al. Cross-protein transfer learning substantially improves disease variant prediction. Genome Biol. 24, 182 (2023). | ||
601 | 12 | 18 | Jain, R., Jain, A., Mauro, E., LeShane, K. & Densmore, D. ICOR: improving codon optimization with recurrent neural networks. BMC Bioinforma. 24, 132 (2023). | ||
1818 | 35 | 91 | Janan T Eppig, Cynthia L Smith, Judith A Blake, Martin Ringwald, James A Kadin, Joel E Richardson, and Carol J Bult. 2017. Mouse Genome Informatics (MGI): resources for mining mouse genetic, genomic, and biological data in support of primary and translational research. Systems Genetics: Methods and Protocols (2017), 47–73. | ||
1848 | 35 | 121 | Janna Hastings, Gareth Owen, Adriano Dekker, Marcus Ennis, Namrata Kale, Venkatesh Muthukrishnan, Steve Turner, Neil Swainston, Pedro Mendes, and Christoph Steinbeck. 2016. ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic acids research 44, D(2016), D1214–D1219. | ||
1284 | 26 | 47 | Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020. | ||
1671 | 34 | 46 | Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models. CoRR, abs/2001.08361, 2020. | ||
668 | 13 | 23 | Javan, R., Kim, T., and Mostaghni, N. Gpt-4 vision: Multimodal evolution of chatgpt and potential role in radiology. Cureus, 16(8):e68298, 2024. | ||
1339 | 26 | 102 | Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models. CoRR, abs/2311.07911, 2023. | ||
1624 | 33 | 80 | Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models. CoRR, abs/2311.07911, 2023. | ||
1725 | 34 | 100 | Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models. CoRR, abs/2311.07911, 2023. | ||
776 | 15 | 40 | Jekaterina Novikova, Ondˇ rej Duˇ sek, and Verena Rieser. The e2e dataset: New challenges for end-to-end generation. arXiv preprint arXiv:1706.09254, 2017. | ||
1040 | 22 | 123 | Jeliazkov JR, del Alamo D, Karpiak JD. Esmfold hallucinates native-like protein sequences. bioRxiv 2023; 2023–05. | ||
1897 | 35 | 170 | Jens Kringelum, Sonny Kim Kjaerulff, Søren Brunak, Ole Lund, Tudor I Oprea, and Olivier Taboureau. 2016. ChemProt-3.0: a global chemical biology diseases mapping. Database 201(2016), bav123. | ||
2006 | 35 | 279 | Jerret Ross, Brian Belgodere, Vijil Chenthamarakshan, Inkit Padhi, Youssef Mroueh, and Payel Das. 2022. Large-scale chemical language representations capture molecular structure and properties. Nature Machine Intelligence 4, 12 (2022), 1256–1264. | ||
1110 | 23 | 56 | Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, Richard Socher, and Nazneen Fatema Rajani. BERTology Meets Biology: Interpreting Attention in Protein Language Models, March 2021. URL http://arxiv.org/abs/2006.15222. arXiv:2006.15222 [cs, q-bio]. | ||
943 | 22 | 26 | Ji Y, Zhou Z, Liu H. et al. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 2021;37:2112–20. 10.1093/bioinformatics/btab083. | ||
825 | 17 | 11 | Ji Z, Lee N, Frieske R et al. Survey of hallucination in natural language generation. ACM Comput Surv 2023;55:1–38. | ||
412 | 8 | 13 | Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021). | ||
442 | 9 | 20 | Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. Dnabert: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021). | ||
502 | 10 | 20 | Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. Dnabert: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120 (2021). | ||
63 | 3 | 55 | Ji, Y., Zhou, Z., Liu, H., and Davuluri, R. V. (2021). DNABERT: pre-trained bidirectional encoder representations from Transformers model for DNA-language in genome. Bioinformatics 37, 2112–2120. | ||
1448 | 30 | 31 | Jia F., Teer J.K., Knepper T.C., Lee J.K., Zhou H.H., He Y.J., McLeod H.L. Discordance of somatic mutations between Asian and Caucasian patient populations with gastric cancer. Mol. Diagn. Ther. 2017; 21:179–185. | ||
991 | 22 | 74 | Jia G, Li Y, Zhang H. et al. Estimating heritability and genetic correlations from large health datasets in the absence of genetic data. Nat Commun 2019;10:5508. 10.1038/s41467-019-13455-0. | ||
990 | 22 | 73 | Jia G, Li Y, Zhong X. et al. The high-dimensional space of human diseases built from diagnosis records and mapped to genetic loci. Nat Comput Sci 2023;3:403–17. 10.1038/s43588-023-00453-y. | ||
1915 | 35 | 188 | Jiahao Li, Zhourun Wu, Wenhao Lin, Jiawei Luo, Jun Zhang, Qingcai Chen, and Junjie Chen. 2023. iEnhancer-ELM: improve enhancer identification by extracting position-related multiscale contextual information based on enhancer language models. Bioinformatics Advances 3, 1 (2023), vbad043. | ||
1330 | 26 | 93 | Jian Yang, Jiaxi Yang, Ke Jin, Yibo Miao, Lei Zhang, Liqun Yang, Zeyu Cui, Yichang Zhang, Binyuan Hui, and Junyang Lin. Evaluating and aligning codellms on human preference. CoRR, abs/2412.05210, 2024c. | ||
1716 | 34 | 91 | Jian Yang, Jiaxi Yang, Ke Jin, Yibo Miao, Lei Zhang, Liqun Yang, Zeyu Cui, Yichang Zhang, Binyuan Hui, and Junyang Lin. Evaluating and aligning codellms on human preference. CoRR, abs/2412.05210, 2024c. | ||
2152 | 35 | 425 | Jian Zhou and Olga G Troyanskaya. 2015. Predicting effects of noncoding variants with deep learning–based sequence model. Nature methods 12, 10 (2015), 931–934. | ||
1114 | 23 | 60 | Jian Zhou and Olga G Troyanskaya. Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods, 12(10):931–934, October 2015. ISSN 1548-7091,1548-7105. doi: 10.1038/nmeth.3547. URL https://www.nature.com/articles/nmeth.3547. | ||
744 | 15 | 8 | Jian-Feng Cai, Emmanuel J Cand` es, and Zuowei Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on optimization, 20(4):1956–1982, 2010. | ||
1843 | 35 | 116 | Jiang Guo, A Santiago Ibanez-Lopez, Hanyu Gao, Victor Quach, Connor W Coley, Klavs F Jensen, and Regina Barzilay. 2021. Automated chemical reaction extraction from scientific literature. Journal of chemical information and modeling 62, 9 (2021), 2035–2045. | ||
704 | 14 | 3 | Jiang, A. Q. et al. Mistral 7B. Preprint at https://doi.org/10.48550/arXiv.2310.06825 (2023). | ||
340 | 7 | 28 | Jiang, Y. et al. SEdb: a comprehensive human super-enhancer database. Nucleic Acids Res. 47, D235–D243 (2019). | ||
669 | 13 | 24 | Jiang, Y., Black, K. C., Geng, G., Park, D., Ng, A. Y., and Chen, J. H. Medagentbench: Dataset for benchmarking llms as agents in medical applications, 2025. | ||
2036 | 35 | 309 | Jiangming Sun, Nina Jeliazkova, Vladimir Chupakhin, Jose-Felipe Golib-Dzib, Ola Engkvist, Lars Carlsson, Jörg Wegner, Hugo Ceulemans, Ivan Georgiev, Vedrin Jeliazkov, et al. 2017. ExCAPE-DB: an integrated large scale dataset facilitating Big Data analysis in chemogenomics. Journal of cheminformatics 9 (2017), 1–9. | ||
1316 | 26 | 79 | Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced Transformer with rotary position embedding. Neurocomputing, 568:127063, 2024. | ||
1610 | 33 | 66 | Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced Transformer with rotary position embedding. Neurocomputing, 568:127063, 2024. | ||
1702 | 34 | 77 | Jianlin Su, Murtadha H. M. Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced Transformer with rotary position embedding. Neurocomputing, 568:127063, 2024. | ||
2194 | 36 | 34 | Jianlin Su, Yu Lu, Shengfeng Pan, Ahmed Murtadha, Bo Wen, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding. arXiv preprint arXiv:2104.09864, 2021. | ||
1315 | 26 | 78 | Jianlin Su. The magical effect of the Bias term: RoPE + Bias = better length extrapolation, 2023. URL https://spaces.ac.cn/archives/9577. | ||
1609 | 33 | 65 | Jianlin Su. The magical effect of the Bias term: RoPE + Bias = better length extrapolation, 2023. URL https://spaces.ac.cn/archives/9577. | ||
1701 | 34 | 76 | Jianlin Su. The magical effect of the Bias term: RoPE + Bias = better length extrapolation, 2023. URL https://spaces.ac.cn/archives/9577. | ||
1490 | 31 | 41 | Jianlin Su. Wobert: Word-based chinese bert model - zhuiyiai. Technical report, 2020. URL https://github.com/ZhuiyiTechnology/WoBERT. | ||
1914 | 35 | 187 | Jianquan Li, Xidong Wang, Xiangbo Wu, Zhiyi Zhang, Xiaolong Xu, Jie Fu, Prayag Tiwari, Xiang Wan, and Benyou Wang. 2023. Huatuo-26M, a Large-scale Chinese Medical QA Dataset. arXiv:2305.01526 [cs.CL] | ||
2107 | 35 | 380 | Jianyi Yang, Ambrish Roy, and Yang Zhang. 2012. BioLiP: a semi-manually curated database for biologically relevant ligand–protein interactions. Nucleic acids research 41, D1 (2012), D1096–D1103. | ||
1913 | 35 | 186 | Jiao Li, Yueping Sun, Robin J Johnson, Daniela Sciaky, Chih-Hsuan Wei, Robert Leaman, Allan Peter Davis, Carolyn J Mattingly, Thomas C Wiegers, and Zhiyong Lu. 2016. BioCreative V CDR task corpus: a resource for chemical disease relation extraction. Database 2016 (2016). | ||
1912 | 35 | 185 | Jiatong Li, Yunqing Liu, Wenqi Fan, Xiao-Yong Wei, Hui Liu, Jiliang Tang, and Qing Li. 2023. Empowering Molecule Discovery for Molecule-Caption Translation with Large Language Models: A ChatGPT Perspective. arXiv:2306.06615 [cs.CL] | ||
1291 | 26 | 54 | Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In NeurIPS, 2023. | ||
1678 | 34 | 53 | Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In NeurIPS, 2023. | ||
1586 | 33 | 42 | Jiawei Liu, Chunqiu Steven Xia, Yuyao Wang, and Lingming Zhang. Is your code generated by ChatGPT really correct? Rigorous evaluation of large language models for code generation. In NeurIPS, 2023a. | ||
755 | 15 | 19 | Jihun Ham and Daniel D. Lee. Grassmann discriminant analysis: a unifying view on subspace-based learning. In ICML, pp. 376–383, 2008. URL https://doi.org/10.1145/1390156.1390204. | ||
2190 | 36 | 30 | Jill E Moore, Jonathan Mudge, Nicholas Nelson, Chad Nusbaum, Ioann Popov, Henry E Pratt, Yunjiang Qiu, Srividya Ramakrishnan, Joe Raymond, Leonidas Salichos, Alexandra Scavelli, Jacob M Schreiber, Fritz J Sedlazeck, Lei Hoon See, Rachel M Sherman, Xu Shi, Minyi Shi, Cricket Alicia Sloan, J Seth Strattan, Zhen Tan, Forrest Y Tanaka, Anna Vlasova, Jun Wang, Jonathan Werner, Brian Williams, Min Xu, Chengfei Yan, Lu Yu, Christopher Zaleski, Jing Zhang, Kristin Ardlie, J Michael Cherry, Eric M Mendenhall, William S Noble, Zhiping Weng, Morgan E Levine, Alexander Dobin, Barbara Wold, Ali Mortazavi, Bing Ren, Jesse Gillis, Richard M Myers, Michael P Snyder, Jyoti Choudhary, Aleksandar Milosavljevic, Michael C Schatz, Bradley E Bernstein, Roderic Guigó, Thomas R Gingeras, and Mark Gerstein. The EN-TEx resource of multi-tissue personal epigenomes & variant-impact models. Cell, 186(7):1493–1511.e40, March 2023. | ||
670 | 13 | 25 | Jimenez, C. E., Yang, J., Wettig, A., Yao, S., Pei, K., Press, O., and Narasimhan, K. Swe-bench: Can language models resolve real-world github issues? arXiv preprint arXiv:2310.06770, 2023. | ||
742 | 15 | 6 | Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E. Hinton. Layer normalization, 2016. | ||
826 | 17 | 12 | Jin Q, Leaman R, Lu Z. Retrieve, summarize, and verify: how will chatgpt impact information seeking from the medical literature? J Am Soc Nephrol 2023a;34:1302–4. | ||
827 | 17 | 13 | Jin Q, Wang Z, Floudas CS et al. Matching patients to clinical trials with large language models. arXiv, arXiv:2307.15051, 2023b, preprint: not peer reviewed. | ||
828 | 17 | 14 | Jin Q, Yuan Z, Xiong G et al. Biomedical question answering: a survey of approaches and challenges. ACM Comput Surv 2022;55:1–36. | ||
1050 | 22 | 133 | Jin Q, Yuan Z, Xiong G. et al. Biomedical question answering: a survey of approaches and challenges. ACM Comput Surv 2022;55:1–36. 10.1145/3490238. | ||
2035 | 35 | 308 | Jin Su, Chenchen Han, Yuyang Zhou, Junjie Shan, Xibin Zhou, and Fajie Yuan. 2023. Saprot: Protein language modeling with structure-aware vocabulary. bioRxiv (2023), 2023–10. | ||
721 | 14 | 20 | Jin, J. et al. iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biology 23, 219 (2022). | ||
852 | 18 | 3 | Jin,Z.,Sato,Y.,Kawashima,M.and Kanehisa,M.(2023) KEGG tools for classification and analysis of viral proteins.Protein Sci., 32,e4840. | ||
2106 | 35 | 379 | Jingfeng Yang, Hongye Jin, Ruixiang Tang, Xiaotian Han, Qizhang Feng, Haoming Jiang, Bing Yin, and Xia Hu. 2023. Harnessing the power of llms in practice: A survey on chatgpt and beyond. arXiv preprint arXiv:2304.13712 (2023). | ||
1769 | 35 | 42 | Jinho Chang and Jong Chul Ye. 2024. Bidirectional generation of structure and properties through a single molecular foundation model. Nature Communications 15, 1 (2024), 2323. | ||
1593 | 33 | 49 | Jinjie Ni, Fuzhao Xue, Xiang Yue, Yuntian Deng, Mahir Shah, Kabir Jain, Graham Neubig, and Yang You. MixEval: Deriving wisdom of the crowd from LLM benchmark mixtures. CoRR, abs/2406.06565, 2024. | ||
1552 | 33 | 8 | Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-VL: A frontier large vision-language model with versatile abilities. CoRR, abs/2308.12966, 2023b. | ||
1246 | 26 | 9 | Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, and Tianhang Zhu. Qwen technical report. CoRR, abs/2309.16609, 2023. | ||
1551 | 33 | 7 | Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, and Tianhang Zhu. Qwen technical report. CoRR, abs/2309.16609, 2023a. | ||
1634 | 34 | 9 | Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng Xu, Jin Xu, An Yang, Hao Yang, Jian Yang, Shusheng Yang, Yang Yao, Bowen Yu, Hongyi Yuan, Zheng Yuan, Jianwei Zhang, Xingxuan Zhang, Yichang Zhang, Zhenru Zhang, Chang Zhou, Jingren Zhou, Xiaohuan Zhou, and Tianhang Zhu. Qwen technical report. CoRR, abs/2309.16609, 2023. | ||
1799 | 35 | 72 | Jiyu Cui, Fang Wu, Wen Zhang, Lifeng Yang, Jianbo Hu, Yin Fang, Peng Ye, Qiang Zhang, Xian Suo, Yiming Mo, et al. 2023. Direct prediction of gas | ||
2189 | 36 | 29 | Joel Rozowsky, Jiahao Gao, Beatrice Borsari, Yucheng T Yang, Timur Galeev, Gamze Gürsoy, Charles B Epstein, Kun Xiong, Jinrui Xu, Tianxiao Li, Jason Liu, Keyang Yu, Ana Berthel, Zhanlin Chen, Fabio Navarro, Maxwell S Sun, James Wright, Justin Chang, Christopher J F Cameron, Noam Shoresh, Elizabeth Gaskell, Jorg Drenkow, Jessika Adrian, Sergey Aganezov, François Aguet, Gabriela Balderrama-Gutierrez, Samridhi Banskota, Guillermo Barreto Corona, Sora Chee, Surya B Chhetri, Gabriel Conte Cortez Martins, Cassidy Danyko, Carrie A Davis, Daniel Farid, Nina P Farrell, Idan Gabdank, Yoel Gofin, David U Gorkin, Mengting Gu, Vivian Hecht, Benjamin C Hitz, Robbyn Issner, Yunzhe Jiang, Melanie Kirsche, Xiangmeng Kong, Bonita R Lam, Shantao Li, Bian Li, Xiqi Li, Khine Zin Lin, Ruibang Luo, Mark Mackiewicz, Ran Meng, Published as a conference paper at ICLR 2024 | ||
2078 | 35 | 351 | Johannes Welbl, Nelson F Liu, and Matt Gardner. 2017. Crowdsourcing multiple choice science questions. arXiv preprint arXiv:1707.06209 (2017). Scientific Large Language Models: A Survey on Biological & Chemical Domains 87 | ||
2193 | 36 | 33 | John A Stamatoyannopoulos, Michael Snyder, Ross Hardison, Bing Ren, Thomas Gingeras, David M Gilbert, Mark Groudine, Michael Bender, Rajinder Kaul, Theresa Canfield, et al. An encyclopedia of mouse dna elements (mouse encode). Genome biology, 13(8):1–5, 2012. | ||
1869 | 35 | 142 | John J Irwin, Khanh G Tang, Jennifer Young, Chinzorig Dandarchuluun, Benjamin R Wong, Munkhzul Khurelbaatar, Yurii S Moroz, John Mayfield, and Roger A Sayle. 2020. ZINC20—a free ultralarge-scale chemical database for ligand discovery. Journal of chemical information and modeling 60, 12 (2020), 6065–6073. | ||
1868 | 35 | 141 | John J Irwin, Teague Sterling, Michael M Mysinger, Erin S Bolstad, and Ryan G Coleman. 2012. ZINC: a free tool to discover chemistry for biology. Journal of chemical information and modeling 52, 7 (2012), 1757–1768. | ||
1884 | 35 | 157 | John Jumper, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, Russ Bates, Augustin Žídek, Anna Potapenko, et al. 2021. Highly accurate protein structure prediction with AlphaFold. Nature 596, 7873 (2021), 583–589. | ||
131 | 3 | 123 | Johnson, A. D., Handsaker, R. E., Pulit, S. L., Nizzari, M., O’Donnell, C. J., and de Bakker, P. I. (2017). CAGI: The Critical Assessment of Genome Interpretation. Genome Biology 18, 1–5. | ||
599 | 12 | 16 | Johnson, S. R. et al. Computational scoring and experimental evaluation of enzymes generated by neural networks. Nat. Biotechnol. https://doi.org/10.1038/s41587-024-02214-2 (2024). | ||
1450 | 31 | 1 | Jonas Gehring, Michael Auli, David Grangier, Denis Yarats, and Yann N Dauphin. Convolutional sequence to sequence learning. In International Conference on Machine Learning, pages 1243–1252. PMLR, 2017. | ||
778 | 15 | 42 | Jonas Pfeiffer, Aishwarya Kamath, Andreas R¨ uckl´ e, Kyunghyun Cho, and Iryna Gurevych. Adapter-fusion: Non-destructive task composition for transfer learning, 2021. | ||
1072 | 23 | 18 | Jonathan Frazer, Pascal Notin, Mafalda Dias, Aidan Gomez, Joseph K. Min, Kelly Brock, Yarin Gal, and Debora S. Marks. Disease variant prediction with deep generative models of evolutionary data. Nature, 599(7883):91–95, November 2021. ISSN 1476-4687. doi: 10.1038/s41586-021-04043-8. URL https://doi.org/10.1038/s41586-021-04043-8. | ||
1274 | 26 | 37 | Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training computeoptimal large language models. CoRR, abs/2203.15556, 2022. | ||
1661 | 34 | 36 | Jordan Hoffmann, Sebastian Borgeaud, Arthur Mensch, Elena Buchatskaya, Trevor Cai, Eliza Rutherford, Diego de Las Casas, Lisa Anne Hendricks, Johannes Welbl, Aidan Clark, et al. Training computeoptimal large language models. CoRR, abs/2203.15556, 2022. | ||
41 | 3 | 33 | Jores, T., Tonnies, J., Wrightsman, T., Buckler, E. S., Cuperus, J. T., Fields, S., and Queitsch, C. (2021). Synthetic promoter designs enabled by a comprehensive analysis of plant core promoters. Nature Plants 7, 842–855. | ||
1240 | 26 | 3 | Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr´ on, and Sumit Sanghai. GQA: Training generalized multi-query Transformer models from multi-head checkpoints. In EMNLP, pp. 4895–4901. Association for Computational Linguistics, 2023. | ||
1547 | 33 | 3 | Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr´ on, and Sumit Sanghai. GQA: Training generalized multi-query Transformer models from multi-head checkpoints. In EMNLP, pp. 4895–4901. Association for Computational Linguistics, 2023. | ||
1628 | 34 | 3 | Joshua Ainslie, James Lee-Thorp, Michiel de Jong, Yury Zemlyanskiy, Federico Lebr´ on, and Sumit Sanghai. GQA: Training generalized multi-query Transformer models from multi-head checkpoints. In EMNLP, pp. 4895–4901. Association for Computational Linguistics, 2023. | ||
1962 | 35 | 235 | Joshua Meier, Roshan Rao, Robert Verkuil, Jason Liu, Tom Sercu, and Alexander Rives. 2021. Language models enable zero-shot prediction of the effects of mutations on protein function. bioRxiv (2021). | ||
1447 | 30 | 30 | Ju D., Hui D., Hammond D.A., Wonkam A., Tishkoff S.A. Importance of including Non-European populations in large Human genetic studies to enhance precision medicine. Annu. Rev. Biomed. Data Sci. 2022; 5:321–339. | ||
1446 | 30 | 29 | Jubb H.C., Saini H.K., Verdonk M.L., Forbes S.A. COSMIC-3D provides structural perspectives on cancer genetics for drug discovery. Nat. Genet. 2018; 50:1200–1202. | ||
374 | 7 | 62 | Jubb, A. M. et al. Achaete-scute like 2 (ascl2) is a target of Wnt signalling and is upregulated in intestinal neoplasia. Oncogene 25, 3445–3457 (2006). | ||
1010 | 22 | 93 | Jumper J, Evans R, Pritzel A. et al. Highly accurate protein structure prediction with AlphaFold. Nature 2021;596:583–9. 10.1038/s41586-021-03819-2. | ||
425 | 9 | 3 | Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021). | ||
485 | 10 | 3 | Jumper, J. et al. Highly accurate protein structure prediction with alphafold. Nature 596, 583–589 (2021). | ||
1781 | 35 | 54 | Jun Cheng, Guido Novati, Joshua Pan, Clare Bycroft, Akvil˙ e Žemgulyt˙ e, Taylor Applebaum, Alexander Pritzel, Lai Hong Wong, Michal Zielinski, Tobias Sargeant, et al. 2023. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, 6664 (2023), eadg7492. | ||
1780 | 35 | 53 | Jun Cheng, Muhammed Hasan Çelik, Thi Yen Duong Nguyen, Žiga Avsec, and Julien Gagneur. 2019. CAGI 5 splicing challenge: improved exon skipping and intron retention predictions with MMSplice. Human mutation 40, 9 (2019), 1243–1251. | ||
1059 | 23 | 5 | Jun Wang, Lachlan J. M. Coin, Lin Fang, Xiaosen Guo, Xin Jin, Guoqing Li, Qibin Li, Yingrui Li, Zhenyu Li, Haoxiang Lin, Binghang Liu, Ruibang Luo, Haojing Shao, Yinlong Xie, Chen Ye, Chang Yu, Fan Zhang, Hancheng Zheng, Hongmei Zhu, Can Alkan, Elif Dal, Fatma Kahveci, Gabor T. Marth, Erik P. Garrison, Deniz Kural, Wan-Ping Lee, Wen Fung Leong, Michael Stromberg, Alistair N. Ward, Jiantao Wu, Mengyao Zhang, Mark J. Daly, Mark A. DePristo, Robert E. Handsaker, David M. Altshuler, Eric Banks, Gaurav Bhatia, Guillermo del Angel, Stacey B. Gabriel, Giulio Genovese, Namrata Gupta, Heng Li, Seva Kashin, Eric S. Lander, Steven A. McCarroll, James C. Nemesh, Ryan E. Poplin, Seungtai C. Yoon, Jayon Lihm, Vladimir Makarov, Andrew G. Clark, Srikanth Gottipati, Alon Keinan, Juan L. Rodriguez-Flores, Jan O. Korbel, Tobias Rausch, Markus H. Fritz, Adrian M. St¨ utz, Paul Flicek, Kathryn Beal, Laura Clarke, Avik Datta, Javier Herrero, William M. McLaren, Graham R. S. Ritchie, Richard E. Smith, Daniel Zerbino, Xiangqun Zheng-Bradley, Pardis C. Sabeti, Ilya Shlyakhter, Stephen F. Schaffner, Joseph Vitti, David N. Cooper, Edward V. Ball, Peter D. Stenson, David R. Bentley, Bret Barnes, Markus Bauer, R. Keira Cheetham, Anthony Cox, Michael Eberle, Sean Humphray, Scott Kahn, Lisa Murray, John Peden, Richard Shaw, Eimear E. Kenny, Mark A. Batzer, Miriam K. Konkel, Jerilyn A. Walker, Daniel G. MacArthur, Monkol Lek, Ralf Sudbrak, Vyacheslav S. Amstislavskiy, Ralf Herwig, Elaine R. Mardis, Li Ding, Daniel C. Koboldt, David Larson, Kai Ye, Simon Gravel, The 1000 Genomes Project Consortium, Corresponding authors, Steering committee, Production group, Baylor College of Medicine, BGI-Shenzhen, Broad Institute of MIT and Harvard, Coriell Institute for Medical Research, European Bioinformatics Institute European Molecular Biology Laboratory, Illumina, Max Planck Institute for Molecular Genetics, McDonnell Genome Institute at Washington University, US National Institutes of Health, University of Oxford, Wellcome Trust Sanger Institute, Analysis group, Affymetrix, Albert Einstein College of Medicine, Bilkent University, Boston College, Cold Spring Harbor Laboratory, Cornell University, European Molecular Biology Laboratory, Harvard University, Human Gene Mutation Database, Icahn School of Medicine at Mount Sinai, Louisiana State University, Massachusetts General Hospital, McGill University, and NIH National Eye Institute. A global reference for human genetic variation. Nature, 526(7571):68–74, October 2015. ISSN 1476-4687. doi: 10.1038/nature15393. URL https://www.nature.com/articles/nature15393. Number: 7571 Publisher: Nature Publishing Group. | ||
2089 | 35 | 362 | Jun Xia, Yanqiao Zhu, Yuanqi Du, Y Liu, and SZ Li. 2023. A Systematic Survey of Chemical Pre-trained Models. IJCAI. | ||
1910 | 35 | 183 | Juncai Li and Xiaofei Jiang. 2021. Mol-BERT: an effective molecular representation with BERT for molecular property prediction. Wireless Communications and Mobile Computing 2021 (2021), 1–7. | ||
1923 | 35 | 196 | June M. Liu, Donghao Li, He Cao, Tianhe Ren, Zeyi Liao, and Jiamin Wu. 2023. ChatCounselor: A Large Language Models for Mental Health Support. arXiv:2309.15461 [cs.CL] | ||
345 | 7 | 33 | Jung, R. G. et al. Association between plasminogen activator inhibitor-1 and cardiovascular events: a systematic review and meta-analysis. Thromb. J. 16, 12 (2018). | ||
1296 | 26 | 59 | Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla P´ erez-Almendros, Abinew Ali Ayele, V´ ıctor Guti´ errez-Basulto, Yazm´ ın Ib´ a nez-Garc´ ıa, Hwaran Lee, Shamsuddeen Hassan Muhammad, Ki-Woong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jos´ e Camacho-Collados, and Alice Oh. Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages. CoRR, abs/2406.09948, 2024. | ||
1682 | 34 | 57 | Junho Myung, Nayeon Lee, Yi Zhou, Jiho Jin, Rifki Afina Putri, Dimosthenis Antypas, Hsuvas Borkakoty, Eunsu Kim, Carla P´ erez-Almendros, Abinew Ali Ayele, V´ ıctor Guti´ errez-Basulto, Yazm´ ın Ib´˜anez-Garc´ ıa, Hwaran Lee, Shamsuddeen Hassan Muhammad, Ki-Woong Park, Anar Sabuhi Rzayev, Nina White, Seid Muhie Yimam, Mohammad Taher Pilehvar, Nedjma Ousidhoum, Jos´ e Camacho-Collados, and Alice Oh. Blend: A benchmark for llms on everyday knowledge in diverse cultures and languages. CoRR, abs/2406.09948, 2024. | ||
1911 | 35 | 184 | Junnan Li, Dongxu Li, Silvio Savarese, and Steven Hoi. 2023. BLIP-2: Bootstrapping language-image pre-training with frozen image encoders and large language models. arXiv preprint:2301.12597 (2023). | ||
1876 | 35 | 149 | Junru Jin, Yingying Yu, Ruheng Wang, Xin Zeng, Chao Pang, Yi Jiang, Zhongshen Li, Yutong Dai, Ran Su, Quan Zou, et al. 2022. iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome biology 23, 1 (2022), 1–23. | ||
2175 | 36 | 15 | Junru Jin, Yingying Yu, Ruheng Wang, Xin Zeng, Chao Pang, Yi Jiang, Zhongshen Li, Yutong Dai, Ran Su, Quan Zou, Kenta Nakai, and Leyi Wei. iDNA-ABF: multi-scale deep biological language learning model for the interpretable prediction of DNA methylations. Genome Biol., 23(1):219, October 2022. | ||
1356 | 28 | 8 | K. Cobbe, V. Kosaraju, M. Bavarian, M. Chen, H. Jun, L. Kaiser, M. Plappert, J. Tworek, J. Hilton, R. Nakano, et al. Training verifiers to solve math word problems. arXiv preprint arXiv:2110.14168, 2021. | ||
1140 | 24 | 23 | K. Gresova, V. Martinek, D. Cechak, P. Simecek, and P. Alexiou. Genomic Benchmarks: A collection of datasets for genomic sequence classification. bioRxiv, 2022. | ||
306 | 5 | 121 | K. K. Yang, N. Fusi, and A. X. Lu. Convolutions are competitive with transformers for protein sequence pretraining. Cell Systems, 15(3):286–294, 2024. | ||
1205 | 25 | 29 | K. Kavukcuoglu, “Real-world challenges for agi,” Nov 2021. | ||
248 | 5 | 61 | K. Marcker and F. Sanger. N-formyl-methionyl-s-rna. Journal of Molecular Biology, 8(6):835–IN8,1964. ISSN 0022-2836. doi: https://doi.org/10.1016/S0022-2836(64)80164-9. URL https://www.sciencedirect.com/science/article/pii/S0022283664801649. | ||
1381 | 28 | 33 | K. Paster, M. D. Santos, Z. Azerbayev, and J. Ba. Openwebmath: An open dataset of high-quality mathematical web text. CoRR, abs/2310.06786, 2023. doi: 10.48550/ARXIV.2310.06786. URL https://doi.org/10.48550/arXiv.2310.06786. | ||
275 | 5 | 90 | K. S. Pollard, M. J. Hubisz, K. R. Rosenbloom, and A. Siepel. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Research, 20(1):110–21, 2010. doi: 10.1101/gr.097857.109. | ||
1219 | 25 | 43 | K. Saab, T. Tu, W.-H. Weng, R. Tanno, D. Stutz, E. Wulczyn, F. Zhang, T. Strother, C. Park, E. Vedadi, J. Z. Chaves, S.-Y. Hu, M. Schaekermann, A. Kamath, Y. Cheng, D. G. T. Barrett, C. Cheung, B. Mustafa, A. Palepu, D. McDuff, L. Hou, T. Golany, L. Liu, J. baptiste Alayrac, N. Houlsby, N. Tomasev, J. Freyberg, C. Lau, J. Kemp, J. Lai, S. Azizi, K. Kanada, S. Man, K. Kulkarni, R. Sun, S. Shakeri, L. He, B. Caine, A. Webson, N. Latysheva, M. Johnson, P. Mansfield, J. Lu, E. Rivlin, J. Anderson, B. Green, R. Wong, J. Krause, J. Shlens, E. Dominowska, S. M. A. Eslami, K. Chou, C. Cui, O. Vinyals, K. Kavukcuoglu, J. Manyika, J. Dean, D. Hassabis, Y. Matias, D. Webster, J. Barral, G. Corrado, C. Semturs, S. S. Mahdavi, J. Gottweis, A. Karthikesalingam, and V. Natarajan, “Capabilities of gemini models in medicine,” 2024. | ||
1217 | 25 | 41 | K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl, P. Payne, M. Seneviratne, P. Gamble, C. Kelly, N. Scharli, A. Chowdhery, P. Mansfield, B. A. y Arcas, D. Webster, G. S. Corrado, Y. Matias, K. Chou, J. Gottweis, N. Tomasev, Y. Liu, A. Rajkomar, J. Barral, C. Semturs, A. Karthikesalingam, and V. Natarajan, “Large language models encode clinical knowledge,” 2022. | ||
1218 | 25 | 42 | K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou, K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal, M. Schaekermann, A. Wang, M. Amin, S. Lachgar, P. Mansfield, S. Prakash, B. Green, E. Dominowska, B. A. y Arcas, N. Tomasev, Y. Liu, R. Wong, C. Semturs, S. S. Mahdavi, J. Barral, D. Webster, G. S. Corrado, Y. Matias, S. Azizi, A. Karthikesalingam, and V. Natarajan, “Towards expert-level medical question answering with large language models,” 2023. | ||
1185 | 25 | 9 | K. T. Mai, S. Bray, T. Davies, and L. D. Griffin, “Warning: Humans cannot reliably detect speech deepfakes,” PLoS One, vol. 18, p. e0285333, Aug. 2023. | ||
300 | 5 | 115 | K. Tunyasuvunakool, J. Adler, Z. Wu, T. Green, M. Zielinski, A. Žídek, A. Bridgland, A. Cowie, C. Meyer, A. Laydon, et al. Highly accurate protein structure prediction for the human proteome. Nature, 596(7873): 590–596, 2021. | ||
1407 | 28 | 59 | K. Zheng, J. M. Han, and S. Polu. Minif2f: a cross-system benchmark for formal olympiad-level mathematics. arXiv preprint arXiv:2109.00110, 2021. | ||
983 | 22 | 66 | Kaddour J, Harris J, Mozes M. et al. Challenges and applications of large languagemodels. arXiv preprint arXiv:2307.10169. 2023. | ||
1530 | 32 | 37 | Kalemati, M., Zamani Emani, M. & Koohi, S. BiComp-DTA: Drug-target binding affinity prediction through complementary biological-related and compression-based featurization approach. PLOS Computational Biology 19, e1011036 (2023). | ||
1263 | 26 | 26 | Kalyan Vasuden Alwala, Kartikeya Upasani, Kate Plawiak, Ke Li, Kenneth Heafield, Kevin Stone, and et al. The Llama 3 herd of models. CoRR, abs/2407.21783, 2024. | ||
344 | 7 | 32 | Kanai, M. et al. Insights from complex trait fine-mapping across diverse populations. Preprint at medRxiv https://doi.org/10.1101/2021.09.03.21262975 (2021). | ||
993 | 22 | 76 | Kanakarajan RK, Kundumani B, Sankarasubbu M. BioELECTRA: Pretrained biomedical text encoder using discriminators. In: Demner-Fushman D, Cohen KB, Ananiadou S, Tsujii J, (eds.), Proceedings of the 20th Workshop on Biomedical Language Processing. Association for Computational Linguistics, Online, 2021;143–154. | ||
802 | 16 | 3 | Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M. KEGG: integrating viruses and cellular organisms. Nucleic Acids Res. 2021;49:D545–51. | ||
801 | 16 | 2 | Kanehisa M. Toward understanding the origin and evolution of cellular organisms. Protein Sci. 2019;28:1947–51. | ||
851 | 18 | 2 | Kanehisa,M.,Furumichi,M.,Sato,Y.,Kawashima,M.and Ishiguro-Watanabe,M.(2023) KEGG for taxonomy-based analysis of pathways and genomes.Nucleic Acids Res.,51,D587–D592. | ||
855 | 18 | 6 | Kanehisa,M.,Sato,Y.and Kawashima,M.(2022) KEGG mapping tools for uncovering hidden features in biological data.Protein Sci.,31,47–53. | ||
850 | 18 | 1 | Kanehisa,M.(2019) Toward understanding the origin and evolution of cellular organisms.Protein Sci.,28,1947–195 | ||
854 | 18 | 5 | Kanehisa,M.and Sato,Y.(2020) KEGG Mapper for inferring cellular functions from protein sequences.Protein Sci.,29,28–35. | ||
2145 | 35 | 418 | Kangjie Zheng, Siyu Long, Tianyu Lu, Junwei Yang, Xinyu Dai, Ming Zhang, Zaiqing Nie, Wei-Ying Ma, and Hao Zhou. 2024. Multi-Scale Protein Language Model for Unified Molecular Modeling. bioRxiv (2024), 2024–03. | ||
829 | 17 | 15 | Kaplan J, McCandlish S, Henighan T et al. Scaling laws for neural language models. arXiv, arXiv:2001.08361, 2020, preprint: not peer reviewed. | ||
2026 | 35 | 299 | Karan Singhal, Shekoofeh Azizi, Tao Tu, S Sara Mahdavi, Jason Wei, Hyung Won Chung, Nathan Scales, Ajay Tanwani, Heather Cole-Lewis, Stephen Pfohl, et al. 2023. Large language models encode clinical knowledge. Nature 620, 7972 (2023), 172–180. | ||
2027 | 35 | 300 | Karan Singhal, Tao Tu, Juraj Gottweis, Rory Sayres, Ellery Wulczyn, Le Hou, Kevin Clark, Stephen Pfohl, Heather Cole-Lewis, Darlene Neal, Mike Schaekermann, Amy Wang, Mohamed Amin, Sami Lachgar, Philip Mansfield, Sushant Prakash, Bradley Green, Ewa Dominowska, Blaise Aguera y Scientific Large Language Models: A Survey on Biological & Chemical Domains 85 Arcas, Nenad Tomasev, Yun Liu, Renee Wong, Christopher Semturs, S. Sara Mahdavi, Joelle Barral, Dale Webster, Greg S. Corrado, Yossi Matias, Shekoofeh Azizi, Alan Karthikesalingam, and Vivek Natarajan. 2023. Towards Expert-Level Medical Question Answering with Large Language Models. arXiv:2305.09617 [cs.CL] | ||
1438 | 30 | 21 | Karczewski K.J., Francioli L.C., Tiao G., Cummings B.B., Alföldi J., Wang Q., Collins R.L., Laricchia K.M., Ganna A., Birnbaum D.P.et al. . The mutational constraint spectrum quantified from variation in 141,456 humans. Nature. 2020; 581:443–434. | ||
317 | 7 | 5 | Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020). | ||
173 | 4 | 33 | Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443(2020). | ||
127 | 3 | 119 | Karczewski, K. J., Francioli, L. C., Tiao, G., Cummings, B. B., Alf¨oldi, J., Wang, Q., Collins, R. L., Laricchia, K. M., Ganna, A., Birnbaum, D. P., Gauthier, L. D., Brand, H., Solomonson, M., Watts, N. A., Rhodes, D., Singer-Berk, M., England, E. M., Seaby, E. G., Kosmicki, J. A., Walters, R. K., Tashman, K., Farjoun, Y., Banks, E., Poterba, T., Consortium, G. A. D., and MacArthur, D. G. (2020). The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443. doi:10.1038/s41586-020-2308-7. | ||
756 | 15 | 20 | Karen Hambardzumyan, Hrant Khachatrian, and Jonathan May. WARP: Word-level Adversarial ReProgramming. arXiv:2101.00121 [cs], December 2020. URL http://arxiv.org/abs/2101.00121. arXiv: 2101.00121. | ||
1523 | 32 | 30 | Karim, A., Lee, M., Balle, T. & Sattar, A. CardioTox net: a robust predictor for hERG channel blockade based on deep learning meta-feature ensembles. Journal of Cheminformatics 13, 1–13 (2021). | ||
1256 | 26 | 19 | Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems. CoRR, abs/2110.14168, 2021. | ||
1563 | 33 | 19 | Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems. CoRR, abs/2110.14168, 2021. | ||
1644 | 34 | 19 | Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems. CoRR, abs/2110.14168, 2021. | ||
95 | 3 | 87 | Karnuta, J. M., and Scacheri, P. C. (2018). Enhancers: bridging the gap between gene control and human disease. Human Molecular Genetics 27, R219–R227. | ||
380 | 7 | 68 | Karolchik, D. et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 32, D493–D496 (2004). | ||
74 | 3 | 66 | Karollus, A., Hingerl, J., Gankin, D., Grosshauser, M., Klemon, K., and Gagneur, J. (2024). Species-aware DNA language models capture regulatory elements and their evolution. Genome Biology 25, 83. | ||
1078 | 23 | 24 | Katar´ ına Greˇsov´ a, Vlastimil Martinek, David Cech´ ak, Petr Simeˇ cek, and Panagiotis Alexiou. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data, 24(1):25, May 2023. ISSN 2730-6844. doi: 10.1186/s12863-023-01123-8. URL https://doi.org/10.1186/s12863-023-01123-8. | ||
2172 | 36 | 12 | Katarína Grešová, Vlastimil Martinek, Davidˇ Cechák, Petr Šimeˇ cek, and Panagiotis Alexiou. Genomic benchmarks: a collection of datasets for genomic sequence classification. BMC Genomic Data, 24(1):25, 2023. | ||
1766 | 35 | 39 | Kathi Canese and Sarah Weis. 2013. PubMed: the bibliographic database. The NCBI handbook 2, 1 (2013). | ||
1064 | 23 | 10 | Kathleen M. Chen, Aaron K. Wong, Olga G. Troyanskaya, and Jian Zhou. A sequence-based global map of regulatory activity for deciphering human genetics. Nature Genetics, 54(7):940–949, July 2022. ISSN 1061-4036, 1546-1718. doi: 10.1038/s41588-022-01102-2. URL https://www.nature.com/articles/s41588-022-01102-2. | ||
800 | 16 | 1 | Kawashima S, Katayama T, Sato Y, Kanehisa M. KEGG API: a web service using SOAP/WSDL to Access the KEGG System. Genome Inform. 2003;14:673. | ||
1828 | 35 | 101 | Kehua Feng, Keyan Ding, Weijie Wang, Xiang Zhuang, Zeyuan Wang, Ming Qin, Yu Zhao, Jianhua Yao, Qiang Zhang, and Huajun Chen. 2024. SciKnowEval: Evaluating Multi-level Scientific Knowledge of Large Language Models. arXiv preprint arXiv:2406.09098 (2024). | ||
1312 | 26 | 75 | Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. WinoGrande: An adversarial winograd schema challenge at scale. Commun. ACM, 64(9):99–106, 2021. | ||
1608 | 33 | 64 | Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. WinoGrande: An adversarial winograd schema challenge at scale. Commun. ACM, 64(9):99–106, 2021. | ||
1698 | 34 | 73 | Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. WinoGrande: An adversarial winograd schema challenge at scale. Commun. ACM, 64(9):99–106, 2021. | ||
1275 | 26 | 38 | Keith Hoskin. The “awful idea of accountability”: Inscribing people into the measurement of objects. Accountability: Power, ethos and the technologies of managing, 1996. | ||
1662 | 34 | 37 | Keith Hoskin. The “awful idea of accountability”: Inscribing people into the measurement of objects. Accountability: Power, ethos and the technologies of managing, 1996. | ||
437 | 9 | 15 | Kelley, D. R. Cross-species regulatory sequence activity prediction. PLOS Comput. Biol. 16, e1008050 (2020). | ||
497 | 10 | 15 | Kelley, D. R. Cross-species regulatory sequence activity prediction. PLOS Comput. Biol. 16, e1008050 (2020). | ||
438 | 9 | 16 | Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018). | ||
498 | 10 | 16 | Kelley, D. R. et al. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res. 28, 739–750 (2018). | ||
50 | 3 | 42 | Kelley, D. R., Reshef, Y. A., Bileschi, M., Belanger, D., McLean, C. Y., and Snoek, J. (2018). Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Research 28, 739–750. | ||
49 | 3 | 41 | Kelley, D. R., Snoek, J., and Rinn, J. L. (2016). Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks. Genome Research 26, 990–999. | ||
1954 | 35 | 227 | Kelong Mao, Xi Xiao, Tingyang Xu, Yu Rong, Junzhou Huang, and Peilin Zhao. 2021. Molecular graph enhanced transformer for retrosynthesis prediction. Neurocomputing 457 (2021), 193–202. | ||
1293 | 26 | 56 | Keming Lu, Bowen Yu, Chang Zhou, and Jingren Zhou. Large language models are superpositions of all | ||
1589 | 33 | 45 | Keming Lu, Bowen Yu, Chang Zhou, and Jingren Zhou. Large language models are superpositions of all characters: Attaining arbitrary role-play via self-alignment. CoRR, abs/2401.12474, 2024b. | ||
1680 | 34 | 55 | Keming Lu, Bowen Yu, Chang Zhou, and Jingren Zhou. Large language models are superpositions of all characters: Attaining arbitrary role-play via self-alignment. CoRR, abs/2401.12474, 2024b. | ||
1292 | 26 | 55 | Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, and Chang Zhou. Online merging optimizers for boosting rewards and mitigating tax in alignment. CoRR, abs/2405.17931, 2024a. | ||
1588 | 33 | 44 | Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, and Chang Zhou. Online merging optimizers for boosting rewards and mitigating tax in alignment. CoRR, abs/2405.17931, 2024a. | ||
1679 | 34 | 54 | Keming Lu, Bowen Yu, Fei Huang, Yang Fan, Runji Lin, and Chang Zhou. Online merging optimizers for boosting rewards and mitigating tax in alignment. CoRR, abs/2405.17931, 2024a. | ||
1590 | 33 | 46 | Keming Lu, Hongyi Yuan, Zheng Yuan, Runji Lin, Junyang Lin, Chuanqi Tan, Chang Zhou, and Jingren Zhou. #InsTag: Instruction tagging for analyzing supervised fine-tuning of large language models. In ICLR. OpenReview.net, 2024c. | ||
2166 | 36 | 6 | Ken Chen, Huiying Zhao, and Yuedong Yang. Capturing large genomic contexts for accurately predicting enhancer-promoter interactions. Brief. Bioinform., 23(2), March 2022. | ||
1773 | 35 | 46 | Ken Chen, Yue Zhou, Maolin Ding, Yu Wang, Zhixiang Ren, and Yuedong Yang. 2023. Self-supervised learning on millions of pre-mRNA sequences improves sequence-based RNA splicing prediction. bioRxiv (2023), 2023–01. | ||
178 | 4 | 38 | Kent, W. J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002). | ||
386 | 7 | 74 | Kent, W. J., Zweig, A. S., Barber, G., Hinrichs, A. S. & Karolchik, D. BigWig and BigBed: enabling browsing of large distributed datasets. Bioinformatics 26, 2204–2207 (2010). | ||
1457 | 31 | 8 | Kevin Clark, Minh-Thang Luong, Quoc V. Le, and Christopher D. Manning. ELECTRA: Pre-training text encoders as discriminators rather than generators. In ICLR, 2020. URL https://openreview.net/pdf?id=r1xMH1BtvB. | ||
1865 | 35 | 138 | Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. 2020. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv:1904.05342 [cs.CL] | ||
2099 | 35 | 372 | Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2018. How powerful are graph neural networks? arXiv preprint arXiv:1810.00826 (2018). | ||
594 | 12 | 11 | Khakzad, H. et al. A new age in protein design empowered by deep learning. Cell Syst. 14, 925–939 (2023). | ||
887 | 21 | 9 | Kim J.-D. et al. (2004) Introduction to the bio-entity recognition task at JNLPBA. In: Proceedings of the International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA/BioNLP), Geneva, Switzerland. pp. 73–78. COLING. https://www.aclweb.org/anthology/W04-1213. | ||
866 | 19 | 5 | Kim S, Thiessen PA, Cheng T, Yu B, Bolton EE. An update on PUG-REST: RESTful interface for programmatic access to PubChem. Nucleic Acids Res. 2018. https://doi.org/10.1093/nar/gky294. (PMID 29718389.) | ||
478 | 9 | 56 | Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with hisat2 and hisat-genotype. Nat. Biotechnol.37, 907–905 (2019). | ||
538 | 10 | 56 | Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with hisat2 and hisat-genotype. Nat. Biotechnol.37, 907–905 (2019). | ||
671 | 13 | 26 | Kim, Y., Park, C., Jeong, H., Chan, Y. S., Xu, X., McDuff, D., Lee, H., Ghassemi, M., Breazeal, C., and Park, H. W. Mdagents: An adaptive collaboration of llms for medical decision-making. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. | ||
96 | 3 | 88 | King, J. L., and Jukes, T. H. (1969). Non-darwinian evolution. Science 164, 788–798. doi:10.1126/science.164.3881.788. | ||
474 | 9 | 52 | Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980v5 (2015). | ||
534 | 10 | 52 | Kingma, D. P. & Ba, J. Adam: a method for stochastic optimization. Preprint at https://arxiv.org/abs/1412.6980v5 (2015). | ||
1529 | 32 | 36 | Kinnings, S. L., Liu, N., Tonge, P. J., Jackson, R. M., Xie, L. & Bourne, P. E. A machine learning-based method to improve docking scoring functions and its application to drug repurposing. Journal of chemical information and modeling 51, 408–419 (2011). | ||
120 | 3 | 112 | Kircher, M., Xiong, C., Martin, B., Schubach, M., Inoue, F., Bell, R. J. A., Costello, J. F., Shendure, J., and Ahituv, N. (2019). Saturation mutagenesis of twenty disease-associated regulatory elements at single base-pair resolution. Nature Communications 10. doi:10.1038/s41467-019-11526-w. | ||
672 | 13 | 27 | Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A. C., Lo, W.-Y., Doll´ ar, P., and Girshick, R. Segment anything, 2023. | ||
1031 | 22 | 114 | Kiselev VY, Yiu A, Hemberg M. Scmap: projection of single-cell RNA-seq data across data sets. Nat Methods 2018;15:359–62. 10.1038/nmeth.4644. | ||
1478 | 31 | 29 | Kishore Papineni, Salim Roukos, Todd Ward, and Wei Jing Zhu. Bleu: a method for automatic evaluation of machine translation. 10 2002. doi:10.3115/1073083.1073135. | ||
1981 | 35 | 254 | Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics. 311–318. | ||
1012 | 22 | 95 | Klausen MS, Jespersen MC, Nielsen H. et al. NetSurfP-2.0: improved prediction of protein structural features by integrated deep learning. Proteins 2019;87:520–7. 10.1002/prot.25674. | ||
365 | 7 | 53 | Klopocki, E. et al. Copy-number variations involving the IHH locus are associated with syndactyly and craniosynostosis. Am. J. Hum. Genet. 88, 70–75 (2011). | ||
387 | 7 | 75 | Koenig, Z. et al. A harmonized public resource of deeply sequenced diverse human genomes. Preprint at bioRxiv https://doi.org/10.1101/2023.01.23.525248 (2023). | ||
1435 | 30 | 18 | Koh G., Degasperi A., Zou X., Momen S., Nik-Zainal S. Mutational signatures: emerging concepts, caveats and clinical applications. Nat. Rev. Cancer. 2021; 21:619–637. | ||
1524 | 32 | 31 | Korotcov, A., Tkachenko, V., Russo, D. P. & Ekins, S. Comparison of deep learning with multiple machine learning methods and metrics using diverse drug discovery data sets. Molecular pharmaceutics 14, 4462–4475 (2017). | ||
888 | 21 | 10 | Krallinger M. et al. (2015) The chemdner corpus of chemicals and drugs and its annotation principles. J. Cheminform., 7. | ||
889 | 21 | 11 | Krallinger M. et al. (2017) Overview of the BioCreative VI chemical-protein interaction track. In: Proceedings of the BioCreative VI Workshop, Bethesda, MD, USA, pp. 141–146. https://academic.oup.com/database/article/doi/10.1093/database/bay073/5055578. | ||
1410 | 29 | 1 | Kruglyak,L. (1999) Prospects for whole-genome linkage disequilibrium mapping of common disease genes. Nature Genet., 22, 139–144. | ||
1474 | 31 | 25 | Krzysztof Choromanski, Valerii Likhosherstov, David Dohan, Xingyou Song, A. Gane, Tamás Sarlós, Peter Hawkins, J. Davis, Afroz Mohiuddin, Lukasz Kaiser, David Belanger, Lucy J. Colwell, and Adrian Weller. Rethinking attention with performers. ArXiv, abs/2009.14794, 2020. | ||
82 | 3 | 74 | Kudo, T. Subword regularization: Improving neural network translation models with multiple subword candidates. In: Gurevych, I., and Miyao, Y., eds. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Melbourne, Australia: Association for Computational Linguistics (2018):( 66–75). https://aclanthology.org/P18-1007. doi:10.18653/v1/P18-1007. | ||
1026 | 22 | 109 | Kulmanov M, Hoehndorf R. DeepGOZero: improving protein function prediction from sequence and zero-shot learning based on ontology axioms. Bioinformatics 2022;38:i238–45. 10.1093/bioinformatics/btac256. | ||
137 | 3 | 129 | Kundaje, A., Meuleman, W., Ernst, J. et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330. | ||
1495 | 32 | 2 | Kuo, K.-T., Mao, T.-L., Jones, S., Veras, E., Ayhan, A., Wang, T.-L., Glas, R., Slamon, D., Velculescu, V. E., Kuman, R. J., et al. Frequent activating mutations of PIK3CA in ovarian clear cell carcinoma. The American journal of pathology 174, 1597–1601 (2009). | ||
169 | 4 | 29 | Kvon, E. Z. et al. Comprehensive in vivo interrogation reveals phenotypic impact of human enhancer variants. Cell 180, 1262–1271.e15 (2020). | ||
1785 | 35 | 58 | Kwang-Hwi Cho, Kyoung Tai No, et al. 2023. iupacGPT: IUPAC-based large-scale molecular pre-trained model for property prediction and molecule generation. (2023). | ||
1937 | 35 | 210 | Kyle Lo, Lucy Lu Wang, Mark Neumann, Rodney Kinney, and Dan S. Weld. 2020. S2ORC: The Semantic Scholar Open Research Corpus. arXiv:1911.02782 [cs.CL] | ||
227 | 5 | 40 | L. A. Hug, B. J. Baker, K. Anantharaman, C. T. Brown, A. J. Probst, C. J. Castelle, C. N. Butterfield, A. W. Hernsdorf, Y. Amano, K. Ise, Y. Suzuki, N. Dudek, D. A. Relman, K. M. Finstad, R. Amundson, B. C. Thomas, and J. F. Banfield. A new view of the tree of life. Nat. Microbiol., 1(5):16048, Apr. 2016. | ||
1382 | 28 | 34 | L. C. Paulson. Three years of experience with sledgehammer, a practical link between automatic and interactive theorem provers. In R. A. Schmidt, S. Schulz, and B. Konev, editors, Proceedings of the 2nd Workshop on Practical Aspects of Automated Reasoning, PAAR-2010, Edinburgh, Scotland, UK, July 14, 2010, volume 9 of EPiC Series in Computing, pages 1–10. EasyChair, 2010. doi: 10.29007/TNFD. URL https://doi.org/10.29007/tnfd. | ||
1360 | 28 | 12 | L. Gao, A. Madaan, S. Zhou, U. Alon, P. Liu, Y. Yang, J. Callan, and G. Neubig. PAL: program-aided language models. In A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, editors, International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, pages 10764–10799. PMLR, 2023. URL https://proceedings.mlr.press/v202/gao23f.html. | ||
550 | 11 | 8 | L. Gao, J. Schulman, and J. Hilton. Scaling laws for reward model overoptimization, 2022. URL https://arxiv.org/abs/2210.10760. | ||
217 | 5 | 30 | L. Gao, T. D. la Tour, H. Tillman, G. Goh, R. Troll, A. Radford, I. Sutskever, J. Leike, and J. Wu. Scaling and evaluating sparse autoencoders. arXiv, 2024a. URL https://arxiv.org/abs/2406.04093. | ||
249 | 5 | 62 | L. McInnes, J. Healy, and J. Melville. UMAP: Uniform manifold approximation and projection for dimension reduction. arXiv, 1802.03426, 2018. | ||
1380 | 28 | 32 | L. Ouyang, J. Wu, X. Jiang, D. Almeida, C. Wainwright, P. Mishkin, C. Zhang, S. Agarwal, K. Slama, A. Ray, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022. | ||
207 | 5 | 20 | L. T. Chow, R. E. Gelinas, T. R. Broker, and R. J. Roberts. An amazing sequence arrangement at the 5 ends of adenovirus 2 messenger RNA. Cell, 12(1):1–8, 1977. | ||
1167 | 24 | 50 | L. Van der Maaten and G. Hinton. Visualizing data using t-sne. Journal of machine learning research, 9(11), 2008. | ||
1208 | 1199 | 25 | 32 | L. Weidinger, M. Rauh, N. Marchal, A. Manzini, L. A. Hendricks, J. Mateos-Garcia, S. Bergman, J. Kay, C. Griffin, B. Bariach, et al., “Sociotechnical safety evaluation of generative ai systems,” arXiv preprint arXiv:2310.11986, 2023. https://arxiv.org/pdf/2310.11986 | |
1199 | 25 | 23 | L. Weidinger, M. Rauh, N. Marchal, A. Manzini, L. A. Hendricks, J. Mateos-Garcia, S. Bergman, J. Kay, C. Griffin, B. Bariach, I. Gabriel, V. Rieser, and W. Isaac, “Sociotechnical safety evaluation of generative ai systems,” 2023.https://arxiv.org/pdf/2310.11986 | ||
1403 | 28 | 55 | L. Yu, W. Jiang, H. Shi, J. Yu, Z. Liu, Y. Zhang, J. T. Kwok, Z. Li, A. Weller, and W. Liu. Metamath: Bootstrap your own mathematical questions for large language models. CoRR, abs/2309.12284, 2023. doi: 10.48550/ARXIV.2309.12284. URL https://doi.org/10.48550/arXiv.2309.12284. | ||
190 | 5 | 4 | L.Nestler,K.Parker,M.Pieler,J.Phang,S.Purohit,H.Schoelkopf,D.Stander,T.Songz,C.Tigges,B.Thérien, P. Wang, and S. Weinbach. GPT-NeoX: Large Scale Autoregressive Language Modeling in PyTorch, 9 2023. URL https://www.github.com/eleutherai/gpt-neox. | ||
1509 | 32 | 16 | Lagunin, A., Filimonov, D., Zakharov, A., Xie, W., Huang, Y., Zhu, F., Shen, T., Yao, J. & Poroikov, V. Computer-aided prediction of rodent carcinogenicity by PASS and CISOC-PSCT. QSAR & Combinatorial Science 28, 806–810 (2009). | ||
38 | 3 | 30 | Lal, A., Garfield, D., Biancalani, T., and Eraslan, G. regLM: Designing realistic regulatory DNA with autoregressive language models. In: International Conference on Research in Computational Molecular Biology. Springer (2024):( 332–335). | ||
1528 | 32 | 35 | Lam, H. T., Sbodio, M. L., Galindo, M. M., Zayats, M., Fernandez-Diaz, R., Valls, V., Picco, G., Ramis, C. B. & Lopez, V. Otter-Knowledge: benchmarks of multimodal knowledge graph representation learning from different sources for drug discovery. arXiv preprint arXiv:2306.12802 (2023). | ||
1437 | 30 | 20 | Landrum M.J., Lee J.M., Benson M., Brown G.R., Chao C., Chitipiralla S., Gu B., Hart J., Hoffman D., Jang W.et al. . ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 2018; 46:D1062–D1067. | ||
162 | 4 | 22 | Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020). | ||
352 | 7 | 40 | Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). | ||
480 | 9 | 58 | Landrum, M. J. et al. Clinvar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). | ||
540 | 10 | 58 | Landrum, M. J. et al. Clinvar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018). | ||
123 | 3 | 115 | Landrum, M. J., Lee, J. M., Benson, M., Brown, G. R., Chao, C., Chitipiralla, S., Gu, B., Hart, J., Hoffman, D., Jang, W. et al. (2016). ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Research 44, D862–D868. | ||
321 | 7 | 9 | Lanyi, J. K. Photochromism of halorhodopsin. cis/trans isomerization of the retinal around the 13–14 double bond. J. Biol. Chem. 261, 14025–14030 (1986). | ||
874 | 20 | 2 | Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD, Garner J, Chen C, Maguire M, Corbett M, Zhou G, et al. DbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 2013;41:D936–D941. | ||
463 | 9 | 41 | Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013). | ||
523 | 10 | 41 | Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013). | ||
754 | 15 | 18 | Lars Grasedyck, Daniel Kressner, and Christine Tobler. A literature survey of low-rank tensor approximation techniques. GAMM-Mitteilungen, 36(1):53–78, 2013. | ||
2007 | 35 | 280 | Lars Ruddigkeit, Ruud Van Deursen, Lorenz C Blum, and Jean-Louis Reymond. 2012. Enumeration of 166 billion organic small molecules in the chemical universe database GDB-17. Journal of chemical information and modeling 52, 11 (2012), 2864–2875. | ||
2126 | 35 | 399 | Le Zhang, Jiayang Chen, Tao Shen, Yu Li, and Siqi Sun. 2023. Enhancing the Protein Tertiary Structure Prediction by Multiple Sequence Alignment Generation. arXiv e-prints (June 2023), 2306.01824. | ||
2158 | 35 | 431 | Le Zhuo, Zewen Chi, Minghao Xu, Heyan Huang, Heqi Zheng, Conghui He, Xian-Ling Mao, and Wentao Zhang. 2024. Protllm: An interleaved protein-language llm with protein-as-word pre-training. arXiv preprint arXiv:2403.07920 (2024). | ||
941 | 22 | 24 | Lee J, Yoon W, Kim S. et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36:1234–40. 10.1093/bioinformatics/btz682. | ||
638 | 12 | 55 | Lee, B. D. Python Implementation of Codon Adaptation Index. J. Open Source Softw. 3, 905 (2018). | ||
93 | 3 | 85 | Lee, K., Ippolito, D., Nystrom, A., Zhang, C., Eck, D., Callison-Burch, C., and Carlini, N. Deduplicating training data makes language models better. In: Muresan, S., Nakov, P., and Villavicencio, A., eds. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Dublin, Ireland: Association for Compu- tational Linguistics (2022):( 8424–8445). https://aclanthology.org/2022.acl-long.577. doi:10.18653/v1/2022.acl-long.577. | ||
143 | 4 | 3 | Lee, S., Abecasis, G. R., Boehnke, M. & Lin, X. Rare-variant association analysis: study designs and statistical tests. Am. J. Hum. Genet. 95, 5–23 (2014). | ||
1538 | 32 | 45 | Leenay, R. T., Aghazadeh, A., Hiatt, J., Tse, D., Roth, T. L., Apathy, R., Shifrut, E., Hultquist, J. F., Krogan, N., Wu, Z., et al. Large dataset enables prediction of repair after CRISPR–Cas9 editing in primary T cells. Nature biotechnology 37,1034–1037 (2019). | ||
1065 | 23 | 11 | Lei Cheng, Tong Yu, Tero Aittokallio, Jukka Corander, Ruslan Khalitov, and Zhirong Yang. Self-supervised learning for DNA sequences with circular dilated convolutional networks. preprint, Bioinformatics, February 2023. URL http://biorxiv.org/lookup/doi/10.1101/2023.01.30.526193. Published as a conference paper at ICLR 2024 | ||
2061 | 35 | 334 | Lei Wang, Hui Zhang, Wei Xu, Zhidong Xue, and Yan Wang. 2023. Deciphering the protein landscape with ProtFlash: a lightweight language model. Cell Reports Physical Science 4, 10 (2023), 101600. | ||
334 | 7 | 22 | Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). | ||
1496 | 32 | 3 | Leontiadou, H., Galdadas, I., Athanasiou, C. & Cournia, Z. Insights into the mechanism of the PIK3CA E545K activating mutation using MD simulations. Scientific reports 8, 15544 (2018). | ||
479 | 9 | 57 | Leslie, R., O’Donnell, C. J. & Johnson, A. D. GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics 30, i185–i194 (2014). | ||
539 | 10 | 57 | Leslie, R., O’Donnell, C. J. & Johnson, A. D. GRASP: analysis of genotype–phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics 30, i185–i194 (2014). | ||
732 | 14 | 31 | Lester, B., Al-Rfou, R. & Constant, N. The Power of Scale for Parameter-Efficient Prompt Tuning. Preprint at https://doi.org/10.48550/arXiv.2104.08691 (2021). | ||
70 | 3 | 62 | Levy, B., Xu, Z., Zhao, L., Kremling, K., Altman, R., Wong, P., and Tanner, C. FloraBERT: cross-species transfer learning withattention-based neural networks for gene expression prediction (2022). https://doi.org/10.21203/rs.3.rs-1927200/v1. doi:10.21203/rs.3.rs-1927200/v1. | ||
830 | 17 | 16 | Lewis P, Perez E, Piktus A et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv Neural Inform Process Syst 2020;33:9459–74. | ||
890 | 21 | 12 | Li J. et al. (2016) Biocreative V CDR task corpus: a resource for chemical disease relation extraction. Database, 2016. | ||
1030 | 22 | 113 | Li C, Liu B, Kang B. et al. SciBet as a portable and fast single cell type identifier. Nat Commun 2020;11:1818. 10.1038/s41467-020-15523-2. | ||
1045 | 22 | 128 | Li C, Zhang M, He Y. The stability-efficiency dilemma: investigating sequence length warmup for training GPT models. Adv Neural Inf Process Syst 2022;35:26736–50. | ||
673 | 13 | 28 | Li, B., Yan, T., Pan, Y., Luo, J., Ji, R., Ding, J., Xu, Z., Liu, S., Dong, H., Lin, Z., et al. Mmedagent: Learning to use medical tools with multi-modal agent. arXiv preprint arXiv:2407.02483, 2024a. | ||
674 | 13 | 29 | Li, C., Wong, C., Zhang, S., Usuyama, N., Liu, H., Yang, J., Naumann, T., Poon, H., and Gao, J. Llava-med: Training a large language-and-vision assistant for biomedicine in one day. Advances in Neural Information Processing Systems, 36, 2024b. | ||
456 | 9 | 34 | Li, F.-Z., Amini, A. P., Yang, K. K. & Lu, A. X. Pretrained protein language model transfer learning: is the final layer representation what we want? in Machine Learning for Structural Biology Workshop (NeurIPS, 2022). | ||
516 | 10 | 34 | Li, F.-Z., Amini, A. P., Yang, K. K. & Lu, A. X. Pretrained protein language model transfer learning: is the final layer representation what we want? in Machine Learning for Structural Biology Workshop (NeurIPS, 2022). | ||
61 | 3 | 53 | Li, F.-Z., Amini, A. P., Yue, Y., Yang, K. K., and Lu, A. X. (2024). Feature reuse and scaling: Understanding transfer learning with protein language models. bioRxiv preprint ( 202402). | ||
381 | 7 | 69 | Li, H. Toward better understanding of artifacts in variant calling from high-coverage samples. Bioinformatics 30, 2843–2851 (2014). | ||
1512 | 32 | 19 | Li, J., Cai, D. & He, X. Learning graph-level representation for drug discovery. arXiv preprint arXiv:1709.03741 (2017). | ||
1510 | 32 | 17 | Li, P., Li, Y., Hsieh, C.-Y., Zhang, S., Liu, X., Liu, H., Song, S. & Yao, X. TrimNet: learning molecular representation from triplet messages for biomedicine. Briefings in Bioinformatics 22, bbaa266 (2021). | ||
348 | 7 | 36 | Li, Y. Y. Plasminogen activator inhibitor-1 4G/5G gene polymorphism and coronary artery disease in the Chinese Han population: a meta-analysis. PLoS ONE 7, e33511 (2012). | ||
675 | 13 | 30 | Lian, J., Liu, J., Zhang, S., Gao, K., Liu, X., Zhang, D., and Yu, Y. A Structure-Aware Relation Network for Thoracic Diseases Detection and Segmentation. IEEE Transactions on Medical Imaging, 2021. doi: 10.48550/arxiv.2104.10326. | ||
1850 | 35 | 123 | Liang He, Shizhuo Zhang, Lijun Wu, Huanhuan Xia, Fusong Ju, He Zhang, Siyuan Liu, Yingce Xia, Jianwei Zhu, Pan Deng, Bin Shao, Tao Qin, and Tie-Yan Liu. 2021. Pre-training Co-evolutionary Protein Representation via A Pairwise Masked Language Model. CoRR abs/2110.15527 (2021). | ||
2037 | 35 | 310 | Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, and Kai Yu. 2023. SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research. arXiv:2308.13149 [cs.CL] | ||
1337 | 26 | 100 | Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In NeurIPS, 2023. | ||
1623 | 33 | 79 | Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In NeurIPS, 2023. | ||
1723 | 34 | 98 | Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging LLM-as-a-judge with MT-Bench and Chatbot Arena. In NeurIPS, 2023. | ||
831 | 17 | 17 | Liévin V, Hother CE, Winther O. Can large language models reason about medical questions? arXiv, arXiv:2207.08143, 2022, preprint: not peer reviewed. | ||
891 | 21 | 13 | Lim S. , Kang J. (2018) Chemical–gene relation extraction using recursive neural network. Database, 2018. | ||
892 | 21 | 14 | Lin C. et al. (2019) A bert-based universal model for both within-and cross-sentence clinical temporal relation extraction. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop, Minneapolis, MN, USA. pp. 65–71. Association for Computational Linguistics. https://www.aclweb.org/anthology/W19-1908. | ||
1006 | 22 | 89 | Lin Z, Akin H, Rao R. et al. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 2023;379:1123–30. 10.1126/science.ade2574. | ||
947 | 22 | 30 | Lin Z, Akin H, Rao R. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. bioRxiv 500902, 2022. | ||
634 | 12 | 51 | Lin, B. C., Kaissarian, N. M. & Kimchi-Sarfaty, C. Implementing computational methods in tandem with synonymous gene recoding for therapeutic development. Trends Pharmacol. Sci. 44, 73–84 (2023). | ||
708 | 14 | 7 | Lin, Z. et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. Preprint at https://doi.org/10.1101/2022.07.20.500902 (2022). | ||
13 | 3 | 5 | Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y. et al. (2023). Evolutionary-scale prediction of atomic-level protein structure with a language model. Science 379, 1123–1130. | ||
115 | 3 | 107 | Linardatos, P., Papastefanopoulos, V., and Kotsiantis, S. (2020). Explainable AI: A review of machine learning interpretability methods. Entropy 23, 18. | ||
1536 | 32 | 43 | Lind, A. P. & Anderson, P. C. Predicting drug activity against cancer cells by random forest models based on minimal genomic information and chemical properties. PloS one 14, e0219774 (2019). | ||
99 | 3 | 91 | Linder, J., Srivastava, D., Yuan, H., Agarwal, V., and Kelley, D. R. (2023). Predicting RNA-seq coverage from DNA sequence as a unifying model of gene regulation. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2023.08.30.555582v1. | ||
1774 | 35 | 47 | Linqing Chen, Weilei Wang, Zilong Bai, Peng Xu, Yan Fang, Jie Fang, Wentao Wu, Lizhi Zhou, Ruiji Zhang, Yubin Xia, et al. 2024. PharmGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry. arXiv preprint arXiv:2406.18045 (2024). | ||
775 | 15 | 39 | Linyong Nan, Dragomir Radev, Rui Zhang, Amrit Rau, Abhinand Sivaprasad, Chiachun Hsieh, Xiangru Tang, Aadit Vyas, Neha Verma, Pranav Krishna, et al. Dart: Open-domain structured data record to text generation. arXiv preprint arXiv:2007.02871, 2020. | ||
428 | 9 | 6 | Littmann, M., Heinzinger, M., Dallago, C., Olenyi, T. & Rost, B. Embeddings from deep learning transfer go annotations beyond homology. Sci. Rep. 11, 1–14 (2021). | ||
488 | 10 | 6 | Littmann, M., Heinzinger, M., Dallago, C., Olenyi, T. & Rost, B. Embeddings from deep learning transfer go annotations beyond homology. Sci. Rep. 11, 1–14 (2021). | ||
430 | 9 | 8 | Littmann, M., Heinzinger, M., Dallago, C., Weissenow, K. & Rost, B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci. Rep. 11, 23916 (2021). | ||
490 | 10 | 8 | Littmann, M., Heinzinger, M., Dallago, C., Weissenow, K. & Rost, B. Protein embeddings and deep learning predict binding residues for various ligand classes. Sci. Rep. 11, 23916 (2021). | ||
977 | 22 | 60 | Liu P, Yuan W, Fu J. et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput Surv 2023;55:1–35. 10.1145/3560815. | ||
984 | 22 | 67 | Liu W, Zhou P, Zhao Z. et al. K-bert: enabling language representation with knowledge graph. In: Proceedings of the AAAI Conference on Artificial Intelligence 2020;34:2901–8. 10.1609/aaai.v34i03.5681. | ||
955 | 22 | 38 | Liu Y and Tian B. Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning [J]. arXiv preprint arXiv, 2023;2306:15912. | ||
957 | 22 | 40 | Liu Y, Tian B. Protein-DNA binding sites prediction based on pre-trained protein language model and contrastive learning. Briefings in Bioinformatics 2024;25.1:bbad488. 10.1093/bib/bbad488. | ||
720 | 14 | 19 | Liu, B., Long, R. & Chou, K.-C. iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework. Bioinformatics 32, 2411–2418 (2016). | ||
676 | 13 | 31 | Liu, B., Zhan, L.-M., Xu, L., Ma, L., Yang, Y., and Wu, X.-M. Slake: A semantically-labeled knowledge-enhanced dataset for medical visual question answering, 2021. | ||
457 | 9 | 35 | Liu, H. et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. in 36th Conference on Neural Information Processing Systems (NeurIPS, 2022). | ||
517 | 10 | 35 | Liu, H. et al. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. in 36th Conference on Neural Information Processing Systems (NeurIPS, 2022). | ||
717 | 14 | 16 | Liu, H. et al. Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning. Preprint at https://doi.org/10.48550/arXiv.2205.05638 (2022). | ||
677 | 13 | 32 | Liu, X., Yu, H., Zhang, H., Xu, Y., Lei, X., Lai, H., Gu, Y., Ding, H., Men, K., Yang, K., et al. Agentbench: Evaluating llms as agents. arXiv preprint arXiv:2308.03688, 2023. | ||
617 | 12 | 34 | Liu, Y. A code within the genetic code: codon usage regulates co-translational protein folding. Cell Commun. Signal. 18, 1–9 (2020). | ||
395 | 7 | 83 | Liu, Y., Sarkar, A., Kheradpour, P., Ernst, J. & Kellis, M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 18, 193 (2017). | ||
1518 | 32 | 25 | Liu, Y., Wu, Y., Shen, X. & Xie, L. COVID-19 multi-targeted drug repurposing using few-shot learning. Frontiers in Bioinformatics 1, 693177 (2021). | ||
616 | 12 | 33 | Liu, Y., Yang, Q. & Zhao, F. Synonymous But Not Silent: the Codon Usage Code for Gene Expression and Protein Folding. Annu. Rev. Biochem. 90, 375–401 (2021). | ||
1948 | 35 | 221 | Liuzhenghao Lv, Zongying Lin, Hao Li, Yuyang Liu, Jiaxi Cui, Calvin Yu-Chian Chen, Li Yuan, and Yonghong Tian. 2024. ProLLaMA: A Protein Large Language Model for Multi-Task Protein Language Processing. arXiv preprint arXiv:2402.16445 (2024). | ||
134 | 3 | 126 | Livesey, B. J., Badonyi, M., Dias, M., Frazer, J., Kumar, S., Lindorff-Larsen, K., McCandlish, D. M., Orenbuch, R., Shearer, C. A., Muffley, L. et al. (2024). Guidelines for releasing a variant effect predictor. arXiv preprint. https://arxiv.org/abs/2404.10807. | ||
1979 | 35 | 252 | Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. 2022. Training language models to follow instructions with human feedback. cite arxiv:2203.02155. | ||
2186 | 36 | 26 | Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback, 2022. | ||
1300 | 26 | 63 | Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback. In NeurIPS, 2022. | ||
1686 | 34 | 61 | Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul F. Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback. In NeurIPS, 2022. | ||
91 | 3 | 83 | Longpre, S., Biderman, S., Albalak, A., Schoelkopf, H., McDuff, D., Kapoor, S., Klyman, K., Lo, K., Ilharco, G., San, N. et al. (2024). The responsible foundation model development cheatsheet: A review of tools & resources. arXiv preprint arXiv:2406.16746. https://arxiv.org/abs/2406.16746. | ||
1938 | 35 | 211 | Loredana Lo Conte, Bart Ailey, Tim JP Hubbard, Steven E Brenner, Alexey G Murzin, and Cyrus Chothia. 2000. SCOP: a structural classification of proteins database. Nucleic acids research 28, 1 (2000), 257–259. | ||
1753 | 35 | 26 | Lorenz C Blum and Jean-Louis Reymond. 2009. 970 million druglike small molecules for virtual screening in the chemical universe database GDB-13. Journal of the American Chemical Society 131, 25 (2009), 8732–8733. | ||
625 | 12 | 42 | Lorenz, R. et al. ViennaRNA Package 2.0. Algorithms Mol. Biol. 6, 26 (2011). | ||
79 | 3 | 71 | Lorenz, R., Bernhart, S. H., H¨oner zu Siederdissen, C., Tafer, H., Flamm, C., Stadler, P. F., and Hofacker, I. L. (2011). ViennaRNA Package 2.0. Algorithms for Molecular Biology 6, 1–14. | ||
893 | 21 | 15 | Lou Y. et al. (2017) A transition-based joint model for disease named entity recognition and normalization. Bioinformatics, 33, 2363–2371. | ||
350 | 7 | 38 | Lowe, G. D. et al. Tissue plasminogen activator antigen and coronary heart disease. Prospective study and meta-analysis. Eur. Heart J. 25, 252–259 (2004). | ||
1029 | 22 | 112 | Lu Y, Jiang X, Fang Y. et al. Learning to pre-train graph neural networks. Proceedings of the AAAI Conference on Artificial Intelligence 2021;35:4276–84. 10.1609/aaai.v35i5.16552. | ||
1248 | 26 | 11 | Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, and Madian Khabsa. The Belebele benchmark: A parallel reading comprehension dataset in 122 language variants. CoRR, abs/2308.16884, 2023. | ||
1554 | 33 | 10 | Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, and Madian Khabsa. The Belebele benchmark: A parallel reading comprehension dataset in 122 language variants. CoRR, abs/2308.16884, 2023. | ||
1636 | 34 | 11 | Lucas Bandarkar, Davis Liang, Benjamin Muller, Mikel Artetxe, Satya Narayan Shukla, Donald Husa, Naman Goyal, Abhinandan Krishnan, Luke Zettlemoyer, and Madian Khabsa. The Belebele benchmark: A parallel reading comprehension dataset in 122 language variants. CoRR, abs/2308.16884, 2023. | ||
1958 | 35 | 231 | Łukasz Maziarka, Dawid Majchrowski, Tomasz Danel, Piotr Gaiński, Jacek Tabor, Igor Podolak, Paweł Morkisz, and Stanisław Jastrzębski. 2024. Relative molecule self-attention transformer. Journal of Cheminformatics 16, 1 (2024), 3. | ||
1957 | 35 | 230 | Łukasz Maziarka, Tomasz Danel, Sławomir Mucha, Krzysztof Rataj, Jacek Tabor, and Stanisław Jastrzębski. 2020. Molecule attention transformer. arXiv preprint arXiv:2002.08264 (2020). | ||
894 | 21 | 16 | Luo L. et al. (2018) An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics, 34, 1381–1388. | ||
832 | 17 | 18 | Luo R, Sun L, Xia Y et al. Biogpt: generative pre-trained transformer for biomedical text generation and mining. Brief Bioinform 2022;23. https://doi.org/10.1093/bib/bbac409. | ||
618 | 12 | 35 | Lyu, X. & Liu, Y. Nonoptimal codon usage is critical for protein structure and function of the master general amino acid control regulator CPC-1. MBio 11, (2020). | ||
715 | 14 | 14 | M. Byrska-Bishop, U. S. Evani, X. Zhao, A. O. Basile, H. J. Abel, A. A. Regier, A. Corvelo, W. E. Clarke, R. Musunuri, K. Nagulapalli, et al., “High-coverage whole-genome sequencing of the expanded 1000 genomes project cohort including 602 trios,” Cell, vol. 185, no. 18, pp. 3426– 3440, 2022. | ||
545 | 11 | 3 | M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. de Oliveira Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, G. Brockman, A. Ray, R. Puri, G. Krueger, M. Petrov, H. Khlaaf, G. Sastry, P. Mishkin, B. Chan, S. Gray, N. Ryder, M. Pavlov, A. Power, L. Kaiser, M. Bavarian, C. Winter, P. Tillet, F. P. Such, D. Cummings, M. Plappert, F. Chantzis, E. Barnes, A. Herbert-Voss, W. H. Guss, A. Nichol, A. Paino, N. Tezak, J. Tang, I. Babuschkin, S. Balaji, S. Jain, W. Saunders, C. Hesse, A. N. Carr, J. Leike, J. Achiam, V. Misra, E. Morikawa, A. Radford, M. Knight, M. Brundage, | ||
1212 | 25 | 36 | M. Dubiel, A. Sergeeva, and L. A. Leiva, “Impact of voice fidelity on decision making: A potential dark pattern?,” 2024. | ||
399 | 8 | 0 | M. E. Consens, B. Li, A. R. Poetsch, and S. Gilbert. Genomic language models could transform medicine but not yet. npj Digital Medicine, 8:1–4, 12 2025. ISSN 23986352. doi: 10.1038/S41746-025-01603-4;SUBJMETA=1538,692,700;KWRD=HEALTH+CARE, HEALTH+POLICY. URL https://www.nature.com/articles/s41746-025-01603-4. | https://qiita.com/kaizen_nagoya/items/cfd6c64b13644c04eeed | |
214 | 5 | 27 | M. G. Durrant, N. T. Perry, J. J. Pai, A. R. Jangid, J. S. Athukoralage, M. Hiraizumi, J. P. McSpedon, A. Pawluk, H. Nishimasu, S. Konermann, and P. D. Hsu. Bridge RNAs direct programmable recombination of target and donor DNA. Nature, 630(8018):984–993, June 2024. | ||
1225 | 25 | 49 | M. Hutson, “How ai is being used to accelerate clinical trials,” Nature, vol. 627, pp. S2–S5, 2024. | ||
238 | 5 | 51 | M. J. Landrum, J. M. Lee, G. R. Riley, W. Jang, W. S. Rubinstein, D. M. Church, and D. R. Maglott. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research, 42 (D1):D980–D985, 11 2013. ISSN 0305-1048. doi: 10.1093/nar/gkt1113. URL https://doi.org/10.1093/nar/gkt1113. | ||
872 | 20 | 0 | M. J. Landrum, J. M. Lee, G. R. Riley, W. Jang, W. S. Rubinstein, D. M. Church, and D. R. Maglott. Clinvar: Public archive of relationships among sequence variation and human phenotype. Nucleic Acids Research, 42, 1 2014. ISSN 03051048. doi: 10.1093/NAR/GKT1113,. URL https://pubmed.ncbi.nlm.nih.gov/24234437/. | ||
849 | 18 | 0 | M. Kanehisa, M. Furumichi, Y. Sato, Y. Matsuura, and M. Ishiguro-Watanabe. Kegg: biological systems database as a model of the real world. Nucleic Acids Research, 53:D672–D677, 1 2025. ISSN 0305-1048. doi: 10.1093/NAR/GKAE909. URL https://dx.doi.org/10.1093/nar/gkae909. | ||
235 | 5 | 48 | M. Kozak. The scanning model for translation: an update. The Journal of cell biology, 108(2):229–241, 1989. | ||
1186 | 25 | 10 | M. Mori, K. F. MacDorman, and N. Kageki, “The uncanny valley [from the field],” IEEE Robotics & automation magazine, vol. 19, no. 2, pp. 98–100, 2012. | ||
546 | 11 | 4 | M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, and W. Zaremba. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374. | ||
1354 | 28 | 6 | M. Murati, K. Mayer, P. Welinder, B. McGrew, D. Amodei, S. McCandlish, I. Sutskever, and W. Zaremba. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. URL https://arxiv.org/abs/2107.03374. | ||
1151 | 24 | 34 | M. Oubounyt, Z. Louadi, H. Tayara, and K. T. Chong. DeePromoter: Robust promoter predictor using deep learning. Frontiers in Genetics, 10, 2019. | ||
274 | 5 | 89 | M. Poli, A. W. Thomas, E. Nguyen, P. Ponnusamy, B. Deiseroth, K. Kersting, T. Suzuki, B. Hie, S. Ermon, C. Ré, et al. Mechanistic design and scaling of hybrid architectures. arXiv preprint arXiv:2403.17844, 2024. | ||
1237 | 26 | 0 | M. Poli, J. Wang, S. Massaroli, J. Quesnelle, R. Carlow, E. Nguyen, and A. Thomas. Striped-Hyena: Moving Beyond Transformers with Hybrid Signal Processing Models, 12 2023. URL https://github.com/togethercomputer/stripedhyena. | ||
1154 | 24 | 37 | M. Poli, S. Massaroli, E. Nguyen, D. Y. Fu, T. Dao, S. Baccus, Y. Bengio, S. Ermon, and C. Ré. Hyena Hierarchy: Towards larger convolutional language models. arXiv preprint arXiv:2302.10866, 2023. | ||
273 | 5 | 88 | M. Poli, S. Massaroli, E. Nguyen, D. Y. Fu, T. Dao, S. Baccus, Y. Bengio, S. Ermon, and C. Ré. Hyena hierarchy: Towards larger convolutional language models. In International Conference on Machine Learning, pages 28043–28078. PMLR, 2023. | ||
283 | 5 | 98 | M. Sandoval-Velasco, O. Dudchenko, J. A. Rodríguez, C. P. Estrada, M. Dehasque, C. Fontsere, S. S. Mak, R. Khan, V. G. Contessoto, A. B. O. Junior, et al. Three-dimensional genome architecture persists in a 52,000-year-old woolly mammoth skin sample. Cell, 187(14):3541–3562, 2024. | ||
286 | 5 | 101 | M. Schubach, T. Maass, L. Nazaretyan, S. Röner, and M. Kircher. CADD v1.7: using protein language models, regulatoryc nn sando the rnucleotide-level score stoimprovegenome-wide variant predictions. Nucleic Acids Research, 52(D1):D1143–D1154, 01 2024. ISSN 0305-1048. doi: 10.1093/nar/gkad989. URL https://doi.org/10.1093/nar/gkad989. | ||
1390 | 28 | 42 | M. Suzgun, N. Scales, N. Schärli, S. Gehrmann, Y. Tay, H. W. Chung, A. Chowdhery, Q. V. Le, E. H. Chi, D. Zhou, et al. Challenging big-bench tasks and whether chain-of-thought can solve them. arXiv preprint arXiv:2210.09261, 2022. | ||
262 | 5 | 75 | M. W. Nirenberg and J. H. Matthaei. The dependence of cell-free protein synthesis in E. coli upon naturally occurring or synthetic polyribonucleotides. Proceedings of the National Academy of Sciences, 47(10):1588–1602, 1961. | ||
1399 | 28 | 51 | M. Wenzel, L. C. Paulson, and T. Nipkow. The isabelle framework. In O. A. Mohamed, C. A. Muñoz, and S. Tahar, editors, Theorem Proving in Higher Order Logics, 21st International Conference, TPHOLs 2008, Montreal, Canada, August 18-21, 2008. Proceedings, volume 5170 of Lecture Notes in Computer Science, pages 33–38. Springer, 2008. doi: 10.1007/978-3-540-71067-7_7. URL https://doi.org/10.1007/978-3-540-71067-7_7. | ||
1172 | 24 | 55 | M. Zaheer, G. Guruganesh, K. A. Dubey, J. Ainslie, C. Alberti, S. Ontanon, P. Pham, A. Ravula, Q. Wang, L. Yang, et al. Big bird: Transformers for longer sequences. Advances in neural information processing systems, 33:17283–17297, 2020. | ||
310 | 5 | 125 | M. Zvyagin, A. Brace, K. Hippe, Y. Deng, B. Zhang, C. O. Bohorquez, A. Clyde, B. Kale, D. Perez-Rivera, H. Ma, C. M. Mann, M. Irvin, D. G. Ozgulbas, N. Vassilieva, J. G. Pauloski, L. Ward, V. Hayot-Sasson, M. Emani, S. Foreman, Z. Xie, D. Lin, M. Shukla, W. Nie, J. Romero, C. Dallago, A. Vahdat, C. Xiao, T. Gibbs, I. Foster, J. J. Davis, M. E. Papka, T. Brettin, R. Stevens, A. Anandkumar, V. Vishwanath, and A. Ramanathan. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. The International Journal of High Performance Computing Applications, 37:683–705, 11 2023. ISSN 1094-3420. doi: 10.1177/10943420231201154. | ||
1175 | 24 | 58 | M. Zvyagin, A. Brace, K. Hippe, Y. Deng, B. Zhang, C. O. Bohorquez, A. Clyde, B. Kale, D. Perez-Rivera, H. Ma, et al. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. bioRxiv, pages 2022–10, 2022. | ||
258 | 5 | 71 | M.Naghipourfar, S.Chen, M.Howard, C.Macdonald, A.Saberi, T.Hagen, M.Mofrad, W.Coyote-Maestas, and H. Goodarzi. A suite of foundation models captures the contextual interplay between codons. bioRxiv, Oct. 2024. doi: 10.1101/2024.10.10.617568. URL http://dx.doi.org/10.1101/2024.10.10.617568. | ||
291 | 5 | 106 | M.Steinegger and J.Söding. MMseqs2 enables sensitive protein sequences earching for the analys is ofmassive data sets. Nat. Biotechnol., 35(11):1026–1028, Nov. 2017. | ||
678 | 13 | 33 | Ma, J., He, Y., Li, F., Han, L., You, C., and Wang, B. Segment anything in medical images. Nature Communications, 15(1), January 2024. ISSN 2041-1723. doi:10.1038/s41467-024-44824-z. | ||
679 | 13 | 34 | Ma, J., Yang, Z., Kim, S., Chen, B., Baharoon, M., Fallahpour, A., Asakereh, R., Lyu, H., and Wang, B. Medsam2: Segment anything in 3d medical images and videos, 2025. | ||
563 | 11 | 21 | MAA. American invitational mathematics examination - aime. Mathematics Examination - AIME 2024, February 2024. URL https://maa.org/math In American Invitational-competitions/american-invitational-mathematics-examination-aime. | ||
960 | 22 | 43 | Madani A, Krause B, Greene ER. et al. Large language models generate functional protein sequences across diverse families. Nat Biotechnol 2023;41:1099–106. 10.1038/s41587-022-01618-2. | ||
706 | 14 | 5 | Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat Biotechnol 41, 1099–1106 (2023). | ||
597 | 12 | 14 | Madani, A. et al. Large language models generate functional protein sequences across diverse families. Nat. Biotechnol. 41, 1099–1106 (2023). | ||
34 | 3 | 26 | Madani, A., Krause, B., Greene, E. R., Subramanian, S., Mohr, B. P., Holton, J. M., Olmos, J. L., Xiong, C., Sun, Z. Z., Socher, R. et al. (2023). Large language models generate functional protein sequences across diverse families. Nature Biotechnology 41, 1099–1106. | ||
927 | 22 | 10 | Mahmud M, Kaiser MS, Hussain A. et al. Applications of deep learning and reinforcement learning to biological data. IEEE Trans Neural Netw Learn Syst 2018;29:2063–79. 10.1109/TNNLS.2018.2790388. | ||
1113 | 23 | 59 | Manzil Zaheer, Guru Guruganesh, Kumar Avinava Dubey, Joshua Ainslie, Chris Alberti, Santiago Ontanon, Philip Pham, Anirudh Ravula, Qifan Wang, Li Yang, and Amr Ahmed. Big Bird: Transformers for Longer Sequences. In Advances in Neural Information Processing Systems, volume 33, pp. 17283–17297. Curran Associates, Inc., 2020. URL https://proceedings.neurips.cc/paper/2020/hash/c8512d142a2d849725f31a9a7a361ab9-Abstract.html. | ||
1545 | 33 | 1 | Marah Abdin, Jyoti Aneja, Sebastien Bubeck, Caio C´ esar Teodoro Mendes, Weizhu Chen, Allie Del Giorno, Ronen Eldan, Sivakanth Gopi, Suriya Gunasekar, Mojan Javaheripi, Piero Kauffmann, Yin Tat Lee, Yuanzhi Li, Anh Nguyen, Gustavo de Rosa, Olli Saarikivi, Adil Salim, Shital Shah, Michael Santacroce, Harkirat Singh Behl, Adam Taumann Kalai, Xin Wang, Rachel Ward, Philipp Witte, Cyril Zhang, and Yi Zhang. Phi-2: The surprising power of small language models, 2024. URL https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/. | ||
1238 | 26 | 1 | Marah I Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat S. Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, S´ ebastien Bubeck, Martin Cai, Caio C´ esar Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, and Xiren Zhou. Phi-3 technical report: A highly capable language model locally on your phone. CoRR, abs/2404.14219, 2024. | ||
1626 | 34 | 1 | Marah I Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat S. Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, S´ ebastien Bubeck, Martin Cai, Caio C´ esar Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, and Xiren Zhou. Phi-3 technical report: A highly capable language model locally on your phone. CoRR, abs/2404.14219, 2024. | ||
1860 | 35 | 133 | Marc Horlacher, Giulia Cantini, Julian Hesse, Patrick Schinke, Nicolas Goedert, Shubhankar Londhe, Lambert Moyon, and Annalisa Marsico. 2023. A systematic benchmark of machine learning methods for protein–RNA interaction prediction. Briefings in Bioinformatics 24, 5 (2023), bbad307. | ||
1103 | 23 | 49 | Marco Salvatore, Marc Horlacher, Annalisa Marsico, Ole Winther, and Robin Andersson. Transfer learning identifies sequence determinants of cell-type specific regulatory element accessibility. NAR Genomics and Bioinformatics, 5(2):lqad026, March 2023. ISSN 2631-9268. doi: 10.1093/nargab/lqad026. URL https://academic.oup.com/nargab/article/doi/10.1093/nargab/lqad026/7092956. | ||
411 | 8 | 12 | Marin, F. I. et al. BEND: Benchmarking DNA language models on biologically meaningful tasks. Preprint at https://doi.org/10.48550/arXiv.2311.12570 (2024). | ||
731 | 14 | 30 | Marin, F. I. et al. BEND: Benchmarking DNA Language Models on biologically meaningful tasks. Preprint at https://doi.org/10.48550/arXiv.2311.12570 (2024). | ||
59 | 3 | 51 | Marin, F. I., Teufel, F., Horlacher, M., Madsen, D., Pultz, D., Winther, O., and Boomsma, W. BEND: Benchmarking DNA Language Models on Biologically Meaningful Tasks. In: International Conference on Learning Representations (2024):. | ||
1896 | 35 | 169 | Mario Krenn, Florian Häse, AkshatKumar Nigam, Pascal Friederich, and Alan Aspuru-Guzik. 2020. Self-referencing embedded strings (SELFIES): A 100% robust molecular string representation. Machine Learning: Science and Technology 1, 4 (2020), 045024. | ||
1108 | 23 | 54 | Mario Stanke and Stephan Waack. Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics (Oxford, England), 19 Suppl 2:ii215–225, October 2003. ISSN 1367-4811. doi: 10.1093/bioinformatics/btg1080. | ||
1252 | 26 | 15 | Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond´ e de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. | ||
1557 | 33 | 13 | Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond´ e de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. | ||
1640 | 34 | 15 | Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Pond´ e de Oliveira Pinto, Jared Kaplan, Harrison Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Joshua Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. CoRR, abs/2107.03374, 2021. | ||
2081 | 35 | 354 | Mark D Wilkinson, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, Jan-Willem Boiten, Luiz Bonino da Silva Santos, Philip E Bourne, et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data 3, 1 (2016), 1–9. | ||
88 | 3 | 80 | Markowitz, V. M., Chen, I.-M. A., Palaniappan, K., Chu, K., Szeto, E., Grechkin, Y., Ratner, A., Jacob, B., Huang, J., Williams, P. et al. (2012). IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Research 40, D115–D122. | ||
429 | 9 | 7 | Marquet, C. et al. Embeddings from protein language models predict conservation and variant effects. Hum. Genet. 141, 1629–1647 (2022). | ||
489 | 10 | 7 | Marquet, C. et al. Embeddings from protein language models predict conservation and variant effects. Hum. Genet. 141, 1629–1647 (2022). | ||
184 | 4 | 44 | Márquez-Luna, C. et al. Incorporating functional priors improves polygenic prediction accuracy in UK Biobank and 23andMe data sets. Nat. Commun. 12, 6052 (2021). | ||
2165 | 36 | 5 | Marta Byrska-Bishop, Uday S. Evani, Xuefang Zhao, Anna O. Basile, Haley J. Abel, Allison A. Regier, André Corvelo, Wayne E. Clarke, Rajeeva Musunuri, Kshithija Nagulapalli, Susan Fairley, Alexi Runnels, Lara Winterkorn, Ernesto Lowy-Gallego, The Human Genome Structural Variation Consortium, Paul Flicek, Soren Germer, Harrison Brand, Ira M. Hall, Michael E. Talkowski, Giuseppe Narzisi, and Michael C. Zody. High coverage whole genome sequencing of the expanded 1000 genomes project cohort including 602 trios. bioRxiv, 2021. doi: 10.1101/2021.02.06.430068. URL https://www.biorxiv.org/content/early/2021/02/07/2021.02.06.430068. | ||
1430 | 30 | 13 | Martin F.J., Amode M.R., Aneja A., Austine-Orimoloye O., Azov A.G., Barnes I., Becker A., Bennett R., Berry A., Bhai J.et al. . Ensembl 2023. Nucleic Acids Res. 2023; 51:D933–D941. | ||
2010 | 35 | 283 | Martin H Schaefer, Jean-Fred Fontaine, Arunachalam Vinayagam, Pablo Porras, Erich E Wanker, and Miguel A Andrade-Navarro. 2012. HIPPIE: Integrating protein interaction networks with experiment based quality scores. PloS one 7, 2 (2012), e31826. | ||
2030 | 35 | 303 | Martin Steinegger and Johannes Söding. 2018. Clustering huge protein sequence sets in linear time. Nature communications 9, 1 (2018), 2542. | ||
2029 | 35 | 302 | Martin Steinegger, Milot Mirdita, and Johannes Söding. 2019. Protein-level assembly increases protein sequence recovery from metagenomic samples manyfold. Nature methods 16, 7 (2019), 603–606. | ||
1439 | 30 | 22 | Martincorena I., Raine K.M., Gerstung M., Dawson K.J., Haase K., Van Loo P., Davies H., Stratton M.R., Campbell P.J. Universal patterns of selection in cancer and somatic tissues. Cell. 2017; 171:1029–1041. | ||
142 | 4 | 2 | Marwaha, S., Knowles, J. W. & Ashley, E. A. A guide for the diagnosis of rare and undiagnosed disease: beyond the exome. Genome Med. 14, 23 (2022). | ||
1831 | 35 | 104 | Mary Forehand. 2010. Bloom’s taxonomy. Emerging perspectives on learning, teaching, and technology 41, 4 (2010), 47–56. | ||
1415 | 29 | 6 | Masood,E. (1999) As consortium plans free SNP map of human genome. Nature, 398, 545–546. | ||
680 | 13 | 35 | Masterman, T., Besen, S., Sawtell, M., and Chao, A. The landscape of emerging ai agent architectures for reasoning, planning, and tool calling: A survey. arXiv preprint arXiv:2404.11584, 2024. | ||
433 | 9 | 11 | Mateo, L. J., Sinnott-Armstrong, N. & Boettiger, A. N. Tracing dna paths and rna profiles in cultured cells and tissues with orca. Nat. Protoc. 16, 1647–1713 (2021). | ||
493 | 10 | 11 | Mateo, L. J., Sinnott-Armstrong, N. & Boettiger, A. N. Tracing dna paths and rna profiles in cultured cells and tissues with orca. Nat. Protoc. 16, 1647–1713 (2021). | ||
322 | 7 | 10 | Mathelier, A., Shi, W. & Wasserman, W. W. Identification of altered cis-regulatory elements in human disease. Trends Genet. 31, 67–76 (2015). | ||
1605 | 33 | 61 | Mathieu Ravaut, Bosheng Ding, Fangkai Jiao, Hailin Chen, Xingxuan Li, Ruochen Zhao, Chengwei Qin, Caiming Xiong, and Shafiq Joty. How much are LLMs contaminated? A comprehensive survey and the llmsanitize library. CoRR, abs/2404.00699, 2024. | ||
1879 | 35 | 152 | Matt Gardner Johannes Welbl, Nelson F. Liu. 2017. Crowdsourcing Multiple Choice Science Questions. arXiv:1707.06209v1. | ||
1489 | 31 | 40 | Matt Mahoney. Large text compression benchmark, http://www.mattmahoney.net/dc/text.html, 2006. | ||
2032 | 35 | 305 | Matt Sternke and Joel Karpiak. 2023. ProteinRL: Reinforcement learning with generative protein language models for property-directed sequence design. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop. | ||
730 | 14 | 29 | Matthews, B. W. Comparison of the predicted and observed secondary structure of T4 phage lysozyme. Biochimica et Biophysica Acta (BBA) - Protein Structure 405, 442–451 (1975). | ||
590 | 12 | 7 | Mauro, V. P. & Chappell, S. A. A critical analysis of codon optimization in human therapeutics. Trends Mol. Med 20, 604–613 (2014). | ||
589 | 12 | 6 | Mauro, V. P. Codon Optimization in the Production of Recombinant Biotherapeutics: Potential Risks and Considerations. BioDrugs 32, 69–81 (2018). | ||
759 | 15 | 23 | Max Jaderberg, Andrea Vedaldi, and Andrew Zisserman. Speeding up convolutional neural networks with low rank expansions. arXiv preprint arXiv:1405.3866, 2014. | ||
2159 | 35 | 432 | Maxim Zvyagin, Alexander Brace, Kyle Hippe, Yuntian Deng, Bin Zhang, Cindy Orozco Bohorquez, Austin Clyde, Bharat Kale, Danilo Perez-Rivera, Heng Ma, et al. 2022. GenSLMs: Genome-scale language models reveal SARS-CoV-2 evolutionary dynamics. bioRxiv (2022). | ||
1997 | 35 | 270 | Mayk Caldas Ramos, Christopher J. Collison, and Andrew D. White. 2024. A Review of Large Language Models and Autonomous Agents in Chemistry. arXiv:2407.01603 [cs.LG] | ||
895 | 21 | 17 | McCann B. et al. (2017) Learned in translation: contextualized word vectors. In: Guyon,I. et al. (eds.), Advances in Neural Information Processing Systems 30, Curran Associates, Inc., pp. 6294–6305. http://papers.nips.cc/paper/7209-learned-in-translation-contextualized-word-vectors.pdf. | ||
1018 | 22 | 101 | McDermott M, Yap B, Szolovits P. et al. Structure-inducing pre-training. Nat Mach Intell 2023;5:612–21. 10.1038/s42256-023-00647-z. | ||
4 | 1 | 2 | McKusick VA. Mendelian Inheritance in Man, A Catolog of Autosomal Dominant, Autosomal Recessive, and X-linked Phenotypes. 1st edn. Baltimore, MD: Johns Hopkins University Press; 1966. | ||
5 | 1 | 3 | McKusick VA. Mendelian Inheritance in Man, A Catolog of Human Genes and Genetic Disorders. 12th edn. Baltimore, MD: Johns Hopkins University Press; 1998. | ||
3 | 1 | 1 | McKusick VA. On the X Chromosome of Man. Quart. Rev. Biol. 1962;37:69–175. doi: 10.1086/40363 | ||
371 | 7 | 59 | McKusick, V. A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007). | ||
1433 | 30 | 16 | McLaren W., Gil L., Hunt S.E., Riat H.S., Ritchie G.R.S., Thormann A., Flicek P., Cunningham F. The Ensembl variant effect predictor. Genome Biol. 2016; 17:122. | ||
462 | 9 | 40 | McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 1–14 (2016). | ||
522 | 10 | 40 | McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 1–14 (2016). | ||
1451 | 31 | 2 | Md. Amirul Islam, Sen Jia, and Neil D. B. Bruce. How much position information do convolutional neural networks encode? ArXiv, abs/2001.08248, 2020. | ||
628 | 12 | 45 | Medina-Muñoz, S. G. et al. Crosstalk between codon optimality and cis-regulatory elements dictates mRNA stability. Genome Biol. 22, 14 (2021). | ||
658 | 13 | 13 | MedRAX: Medical Reasoning Agent for Chest X-ray Cohen, J. P., Hashir, M., Brooks, R., and Bertrand, H. On the limits of cross-domain generalization in automated x-ray prediction. In Medical Imaging with Deep Learning, 2020. | ||
681 | 13 | 36 | MedRAX: Medical Reasoning Agent for Chest X-ray Nori, H., King, N., McKinney, S. M., Carignan, D., and Horvitz, E. Capabilities of gpt-4 on medical challenge problems. arXiv preprint arXiv:2303.13375, 2023. | ||
1081 | 23 | 27 | Meenakshi S. Kagda, Bonita Lam, Casey Litton, Corinn Small, Cricket A. Sloan, Emma Spragins, Forrest Tanaka, Ian Whaling, Idan Gabdank, Ingrid Youngworth, J. Seth Strattan, Jason Hilton, Jennifer Jou, Jessica Au, Jin-Wook Lee, Kalina Andreeva, Keenan Graham, Khine Lin, Matt Simison, Otto Jolanki, Paul Sud, Pedro Assis, Philip Adenekan, Eric Douglas, Mingjie Li, Pedro Assis, Keenan Graham, Paul Sud, Stuart Miyasato, Weiwei Zhong, Yunhai Luo, Zachary Myers, J. Michael Cherry, and Benjamin C. Hitz. Data navigation on the ENCODE portal. 2023. doi: 10.48550/ARXIV.2305.00006. URL https://arxiv.org/abs/2305.00006. Publisher: arXiv Version Number: 2. | ||
356 | 7 | 44 | Mefford, H. C. et al. Recurrent reciprocal genomic rearrangements of 17q12 are associated with renal disease, diabetes, and epilepsy. Am. J. Hum. Genet. 81, 1057–1069 (2007). | ||
145 | 4 | 5 | Meier, J. et al. Language models enable zero-shot prediction of the effects of mutations on protein function. In Proc. Advances in Neural Information Processing Systems 34 (eds Ranzato, M. et al.) 29287–29303 (Curran Associates, Inc., 2021). | ||
14 | 3 | 6 | Meier, J., Rao, R., Verkuil, R., Liu, J., Sercu, T., and Rives, A. Language models enable zero-shot prediction of the effects of mutations on protein function. In:Ranzato, M., Beygelzimer, A., Dauphin, Y., Liang, P., and Vaughan, J. W., eds. Advances in Neural Information Processing Systems vol. 34. Curran Associates, Inc. 2021):( 29287–29303). https://proceedings.neurips.cc/paper_files/paper/2021/file/f51338d736f95dd42427296047067694-Paper.pdf. | ||
1086 | 23 | 32 | Melissa J Landrum, Shanmuga Chitipiralla, Garth R Brown, Chao Chen, Baoshan Gu, Jennifer Hart, Douglas Hoffman, Wonhee Jang, Kuljeet Kaur, Chunlei Liu, Vitaly Lyoshin, Zenith Maddipatla, Rama Maiti, Joseph Mitchell, Nuala O’Leary, George R Riley, Wenyao Shi, George Zhou, Valerie Schneider, Donna Maglott, J Bradley Holmes, and Brandi L Kattman. ClinVar: improvements to accessing data. Nucleic Acids Research, 48(D1):D835–D844, January 2020. ISSN 0305-1048. doi: 10.1093/nar/gkz972. URL https://doi.org/10.1093/nar/gkz972. | ||
1104 | 23 | 50 | Melissa Sanabria, Jonas Hirsch, and Anna R. Poetsch. The human genome’s vocabulary as proposed by the DNA language model GROVER, September 2023. URL https://www.biorxiv.org/content/10.1101/2023.07.19.549677v2. Pages: 2023.07.19.549677 Section: New Results. | ||
448 | 9 | 26 | Mendoza-Revilla, J. et al. A foundational large language model for edible plant genomes. Commun. Biol. 7, 835 (2024). | ||
508 | 10 | 26 | Mendoza-Revilla, J. et al. A foundational large language model for edible plant genomes. Commun. Biol. 7, 835 (2024). | ||
22 | 3 | 14 | Mendoza-Revilla, J., Trop, E., Gonzalez, L., Roller, M., Dalla-Torre, H., de Almeida, B. P., Richard, G., Caton, J., Lopez Carranza, N., Skwark, M., Laterre, A., Beguir, K., Pierrot, T., and Lopez, M. (2024). A foundational large language model for edible plant genomes. Communications Biology 7, 835. https://doi.org/10.1038/s42003-024-06465-2. doi:10.1038/s42003-024-06465-2. | ||
2108 | 35 | 381 | Meng Yang, Haiping Huang, Lichao Huang, Nan Zhang, Jihong Wu, Huanming Yang, and Feng Mu. 2021. LOGO, a contextualized pre-trained language model of human genome flexibly adapts to various downstream tasks by fine-tuning. (2021). | ||
1112 | 23 | 58 | Meng Yang, Haiping Huang, Lichao Huang, Nan Zhang, Jihong Wu, Huanming Yang, and Feng Mu. LOGO, a contextualized pre-trained language model of human genome flexibly adapts to various downstream tasks by fine-tuning. preprint, In Review, August 2021. URL https://www.researchsquare.com/article/rs-448927/v1. | ||
455 | 9 | 33 | Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020). | ||
515 | 10 | 33 | Meuleman, W. et al. Index and biological spectrum of human DNase I hypersensitive sites. Nature 584, 244–251 (2020). | ||
452 | 9 | 30 | Meylan, P., Dreos, R., Ambrosini, G., Groux, R. & Bucher, P. Epd in 2020: enhanced data visualization and extension to ncRNA promoters. Nucleic Acids Res. 48, D65–D69 (2020). | ||
512 | 10 | 30 | Meylan, P., Dreos, R., Ambrosini, G., Groux, R. & Bucher, P. Epd in 2020: enhanced data visualization and extension to ncRNA promoters. Nucleic Acids Res. 48, D65–D69 (2020). | ||
979 | 22 | 62 | Mi H, Muruganujan A, Huang X. et al. Protocol update for large-scale genome and gene function analysis with the PANTHER classification system (v. 14.0). Nat Protoc 2019;14:703–21. 10.1038/s41596-019-0128-8. | ||
833 | 17 | 19 | Mialon G, Dessì R, Lomeli M et al. Augmented language models: a survey. arXiv, arXiv:2302.07842, 2023, preprint: not peer reviewed. | ||
1791 | 35 | 64 | Micaela E Consens, Cameron Dufault, Michael Wainberg, Duncan Forster, Mehran Karimzadeh, Hani Goodarzi, Fabian J Theis, Alan Moses, and Bo Wang. 2023. To Transformers and Beyond: Large Language Models for the Genome. arXiv preprint arXiv:2311.07621 (2023). | ||
1735 | 35 | 8 | Michael Ashburner, Catherine A Ball, Judith A Blake, David Botstein, Heather Butler, J Michael Cherry, Allan P Davis, Kara Dolinski, Selina S Dwight, Janan T Eppig, et al. 2000. Gene ontology: tool for the unification of biology. Nature genetics 25, 1 (2000), 25–29. | ||
1853 | 35 | 126 | Michael Heinzinger, Konstantin Weissenow, Joaquin Gomez Sanchez, Adrian Henkel, Martin Steinegger, and Burkhard Rost. 2023. ProstT5: Bilingual Language Model for Protein Sequence and Structure. bioRxiv (2023). | ||
1840 | 35 | 113 | Michael K Gilson, Tiqing Liu, Michael Baitaluk, George Nicola, Linda Hwang, and Jenny Chong. 2016. BindingDB in 2015: a public database for medicinal chemistry, computational chemistry and systems pharmacology. Nucleic acids research 44, D(2016), D1045–D1053. 78 Zhang and Ding, et al. | ||
1966 | 35 | 239 | Michael M Mysinger, Michael Carchia, John J Irwin, and Brian K Shoichet. 2012. Directory of useful decoys, enhanced (DUD-E): better ligands and decoys for better benchmarking. Journal of medicinal chemistry 55, 14 (2012), 6582–6594. | ||
1098 | 23 | 44 | Michael Poli, Stefano Massaroli, Eric Nguyen, Daniel Y. Fu, Tri Dao, Stephen Baccus, Yoshua Bengio, Stefano Ermon, and Christopher R´ e. Hyena Hierarchy: Towards Larger Convolutional Language Models, April 2023. URL http://arxiv.org/abs/2302.10866. arXiv:2302.10866 [cs]. | ||
2012 | 35 | 285 | Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. 2018. Modeling relational data with graph convolutional networks. In The Semantic Web: 15th International Conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, Proceedings 15.Springer, 593–607. | ||
2055 | 35 | 328 | Michel van Kempen, Stephanie S Kim, Charlotte Tumescheit, Milot Mirdita, Cameron LM Gilchrist, Johannes Söding, and Martin Steinegger. 2022. Foldseek: fast and accurate protein structure search. Biorxiv (2022), 2022–02. | ||
2111 | 35 | 384 | Michihiro Yasunaga, Jure Leskovec, and Percy Liang. 2022. LinkBERT: Pretraining Language Models with Document Links. arXiv:2203.15827 [cs.CL] | ||
1732 | 35 | 5 | Microsoft Research AI4Science and Microsoft Azure Quantum. 2023. The Impact of Large Language Models on Scientific Discovery: A Preliminary Study using GPT-4. arXiv preprint:2311.07361 (2023). | ||
2056 | 35 | 329 | Mihaly Varadi, Stephen Anyango, Mandar Deshpande, Sreenath Nair, Cindy Natassia, Galabina Yordanova, David Yuan, Oana Stroe, Gemma Wood, Agata Laydon, Augustin Žídek, Tim Green, Kathryn Tunyasuvunakool, Stig Petersen, John Jumper, Ellen Clancy, Richard Green, Ankur Vora, Mira Lutfi, Michael Figurnov, Andrew Cowie, Nicole Hobbs, Pushmeet Kohli, Gerard Kleywegt, Ewan Birney, Demis Hassabis, and Sameer Velankar. 2021. AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space with high-accuracy models. Nucleic Acids Research 50, D1 (11 2021), D439–D444. | ||
1907 | 35 | 180 | Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. 2020. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020. 7871–7880. | ||
760 | 15 | 24 | Mikhail Khodak, Neil Tenenholtz, Lester Mackey, and Nicol` o Fusi. Initialization and regularization of factorized neural layers, 2021. | ||
896 | 21 | 18 | Mikolov T. et al. (2013) Distributed representations of words and phrases and their compositionality. In: Burges,C.J.C. (eds.), Advances in Neural Information Processing Systems 26, Curran Associates, Inc., pp. 3111–3119. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf. | ||
2100 | 35 | 373 | Minghao Xu, Xinyu Yuan, Santiago Miret, and Jian Tang. 2023. ProtST: Multi-Modality Learning of Protein Sequences and Biomedical Texts. arXiv:2301.12040 [q-bio.BM] | ||
2101 | 35 | 374 | Minghao Xu, Zuobai Zhang, Jiarui Lu, Zhaocheng Zhu, Yangtian Zhang, Chang Ma, Runcheng Liu, and Jian Tang. 2022. PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding. arXiv:2206.02096 [cs.LG] | ||
1111 | 23 | 57 | Minghao Xu, Zuobai Zhang, Jiarui Lu, Zhaocheng Zhu, Yangtian Zhang, Ma Chang, Runcheng Liu, and Jian Tang. PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding. Advances in Neural Information Processing Systems, 35:35156–35173, December 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/e467582d42d9c13fa9603df16f31de6d-Abstract-Datasets_and_Benchmarks.html. | ||
1874 | 35 | 147 | Minghui Jiang, Ying Xu, and Binhai Zhu. 2008. Protein structure–structure alignment with discrete Fréchet distance. Journal of bioinformatics and computational biology 6, 01 (2008), 51–64. | ||
1904 | 35 | 177 | Minji Lee, Luiz Felipe Vecchietti, Hyunkyu Jung, Hyunjoo Ro, Meeyoung Cha, and Ho Min Kim. 2022. Protein sequence design in a latent space via model-based reinforcement learning. (2022). | ||
1611 | 33 | 67 | Mirac Suzgun, Nathan Scales, Nathanael Sch¨ arli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, and Jason Wei. Challenging BIG- Bench tasks and whether chain-of-thought can solve them. In ACL (Findings), pp. 13003–13051. Association for Computational Linguistics, 2023. | ||
1317 | 26 | 80 | Mirac Suzgun, Nathan Scales, Nathanael Sch¨ arli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, and Jason Wei. Challenging BIG-Bench tasks and whether chain-of-thought can solve them. In ACL (Findings), pp. 13003–13051. Association for Computational Linguistics, 2023. | ||
1703 | 34 | 78 | Mirac Suzgun, Nathan Scales, Nathanael Sch¨ arli, Sebastian Gehrmann, Yi Tay, Hyung Won Chung, Aakanksha Chowdhery, Quoc V. Le, Ed H. Chi, Denny Zhou, and Jason Wei. Challenging BIG-Bench tasks and whether chain-of-thought can solve them. In ACL (Findings), pp. 13003–13051. Association for Computational Linguistics, 2023. | ||
747 | 15 | 11 | Misha Denil, Babak Shakibi, Laurent Dinh, Marc’Aurelio Ranzato, and Nando de Freitas. Predicting parameters in deep learning, 2014. | ||
867 | 19 | 6 | Mitchell JA, Aronson AR, Mork JG, Folk LC, Humphrey SM, Ward JM. Gene indexing: characterization and analysis of NLM's GeneRIFs. AMIA Annu Symp Proc. 2003:460-4. (PMID 14728215.) | ||
64 | 3 | 56 | Mo, S., Fu, X., Hong, C., Chen, Y., Zheng, Y., Tang, X., Lan, Y., Shen, Z., and Xing, E. Multi-modal Self-supervised Pre-training for Large-scale Genome Data. In: NeurIPS 2021 AI for Science Workshop (2021):. | ||
786 | 15 | 50 | Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: Training multi-billion parameter language models using model parallelism, 2020. | ||
897 | 21 | 19 | Mohan S. , Li D. (2019) Medmentions: a large biomedical corpus annotated with UMLS concepts. arXiv preprint arXiv: 1902.09476. | ||
1075 | 23 | 21 | Molly Gasperini, Andrew J. Hill, Jos´ e L. McFaline-Figueroa, Beth Martin, Seungsoo Kim, Melissa D. Zhang, Dana Jackson, Anh Leith, Jacob Schreiber, William S. Noble, Cole Trapnell, Nadav Ahituv, and Jay Shendure. A Genome-wide Framework for Mapping Gene Regulation via Cellular Genetic Screens. Cell, 176(1):377–390.e19, January 2019. ISSN 0092-8674. doi: 10.1016/j.cell.2018.11.029. URL https://www.sciencedirect.com/science/article/pii/S009286741831554X. | ||
627 | 12 | 44 | Montgomery, K. T., Tardiff, J., Reid, L. M. & Krauter, K. S. Negative and positive cis-acting elements control the expression of murine alpha 1-protease inhibitor genes. Mol. Cell. Biol. 10, 2625–2637 (1990). | ||
1052 | 22 | 135 | Moon I, LoPiccolo J, Baca SC. et al. Machine learning for genetics-based classification and treatment response prediction in cancer of unknown primary. Nat Med 2023;29:2057–67. 10.1038/s41591-023-02482-6. | ||
931 | 22 | 14 | Moor M, Banerjee O, Abad ZSH. et al. Foundation models for generalist medical artificial intelligence. Nature 2023;616:259–65. 10.1038/s41586-023-05881-4. | ||
2009 | 35 | 282 | Moritz Schaefer, Peter Peneder, Daniel Malzl, Anna Hakobyan, Varun S Sharma, Thomas Krausgruber, Jörg Menche, Eleni Tomazou, and Christoph Bock. [n. d.]. Joint Embedding of Transcriptomes and Text Enables Interactive Single-Cell RNA-seq Data Exploration via Natural Language. In ICLR 2024 Workshop on Machine Learning for Genomics Explorations. | ||
631 | 12 | 48 | Moss, M. J., Chamness, L. M. & Clark, P. L. The effects of codon usage on protein structure and folding. Annu. Rev. Biophys. 53, 87–108 (2024). | ||
1750 | 35 | 23 | Mostapha Benhenda. 2017. ChemGAN challenge for drug discovery: can AI reproduce natural chemical diversity? (2017). Tristan Bepler and Bonnie Berger. 2021. Learning the protein language: Evolution, structure, and function. Cell systems 12, 6 (2021), 654–669. | ||
1515 | 32 | 22 | Motmaen, A., Dauparas, J., Baek, M., Abedi, M. H., Baker, D. & Bradley, P. Peptide-binding specificity prediction using fine-tuned protein structure prediction networks. Proceedings of the National Academy of Sciences 120, e2216697120 (2023). | ||
130 | 3 | 122 | Moult, J., Fidelis, K., Kryshtafovych, A., Schwede, T., and Tramontano, A. (2018). Critical assessment of methods of protein structure prediction (CASP)—round XII. Proteins: Structure, Function, and Bioinformatics 86, 7–15. | ||
968 | 22 | 51 | Mowoe MO, Garnett S, Lennard K. et al. Pro-MAP: a robust pipeline for the pre-processing of single channel protein microarray data. BMC Bioinformatics 2022;23:534. 10.1186/s12859-022-05095-x. | ||
2018 | 35 | 291 | Murray Shanahan. 2022. Talking About Large Language Models. CoRR abs/2212.03551 (2022). | ||
1477 | 31 | 28 | Myle Ott, Sergey Edunov, Alexei Baevski, Angela Fan, Sam Gross, Nathan Ng, David Grangier, and Michael Auli. fairseq: A fast, extensible toolkit for sequence modeling. pages 48–53, 01 2019. doi:10.18653/v1/N19-4009. | ||
264 | 5 | 77 | N. A. O’Leary, M. W. Wright, J. R. Brister, S. Ciufo, D. Haddad, R. McVeigh, B. Rajput, B. Robbertse, B. Smith-White, D. Ako-Adjei, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research, 44(D1):D733–D745, 2016. | ||
1122 | 24 | 5 | N. Brandes, D. Ofer, Y. Peleg, N. Rappoport, and M. Linial. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics, 38(8):2102–2110, 2022. | ||
196 | 5 | 10 | N. Brandes, G. Goldman, C. H. Wang, C. J. Ye, and V. Ntranos. Genome-wide prediction of disease variant effects with a deep protein language model. Nature Genetics, 55(9):1512–1522, Sept. 2023. ISSN 1546-1718. doi: 10.1038/s41588-023-01465-0. URL https://doi.org/10.1038/s41588-023-01465-0. | ||
307 | 5 | 122 | N. D. Youngblut, J. de la Cuesta-Zuluaga, G. H. Reischer, S. Dauser, N. Schuster, C. Walzer, G. Stalder, A. H. Farnleitner, and R. E. Ley. Large-scale metagenome assembly reveals novel animal-associated microbial genomes, biosynthetic gene clusters, and other genetic diversity, 2020. | ||
1134 | 24 | 17 | N. Ferruz, S. Schmidt, and B. Höcker. ProtGPT2 is a deep unsupervised language model for protein design. Nature communications, 13(1):4348, 2022. | ||
556 | 11 | 14 | N. Jain, K. Han, A. Gu, W. Li, F. Yan, T. Zhang, S. Wang, A. Solar-Lezama, K. Sen, and I. Stoica. Livecodebench: Holistic and contamination free evaluation of large language models for code. CoRR, abs/2403.07974, 2024. URL https://doi.org/10.48550/arXiv.2403.07974. | ||
1229 | 25 | 53 | N. Rosenberg, “Science, invention and economic growth,” The Economic Journal, vol. 84, no. 333, pp. 90–108, 1974. | ||
1163 | 24 | 46 | N. Scalzitti, A. Kress, R. Orhand, T. Weber, L. Moulinier, A. Jeannin-Girardon, O. Collet, Pierre anf Poch, and J. D. Thompson. Spliceator: multi-species splice site prediction using convolutional neural networks. BMC Bioinformatics, 22(1):1–26, 2021. | ||
303 | 5 | 118 | N. Wang, J. Bian, Y. Li, X. Li, S. Mumtaz, L. Kong, and H. Xiong. Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning. Nature Machine Intelligence, pages 1–10, 2024. | ||
1759 | 35 | 32 | Nadav Brandes, Dan Ofer, Yam Peleg, Nadav Rappoport, and Michal Linial. 2022. ProteinBERT: a universal deep-learning model of protein sequence and function. Bioinformatics 38, 8 (2022), 2102–2110. | ||
2013 | 35 | 286 | Nadine Schneider, Nikolaus Stiefl, and Gregory A Landrum. 2016. What’s what: The (nearly) definitive guide to reaction role assignment. Journal of chemical information and modeling 56, 12 (2016), 2336–2346. | ||
1115 | 23 | 61 | Naihui Zhou, Yuxiang Jiang, Timothy R. Bergquist, Alexandra J. Lee, Balint Z. Kacsoh, Alex W. Crocker, Kimberley A. Lewis, George Georghiou, Huy N. Nguyen, Md Nafiz Hamid, Larry Davis, Tunca Dogan, Volkan Atalay, Ahmet S. Rifaioglu, Alperen Dalkıran, Rengul Cetin Atalay, Chengxin Zhang, Rebecca L. Hurto, Peter L. Freddolino, Yang Zhang, Prajwal Bhat, Fran Supek, Jos´ e M. Fern´ andez, Branislava Gemovic, Vladimir R. Perovic, Radoslav S. Davidovi´ c, Neven Sumonja, Nevena Veljkovic, Ehsaneddin Asgari, Mohammad R.K. Mofrad, Giuseppe Profiti, Castrense Savojardo, Pier Luigi Martelli, Rita Casadio, Florian Boecker, Heiko Schoof, Indika Kahanda, Natalie Thurlby, Alice C. McHardy, Alexandre Renaux, Rabie Saidi, Julian Gough, Alex A. Freitas, Magdalena Antczak, Fabio Fabris, Mark N. Wass, Jie Hou, Jianlin Cheng, Zheng Wang, Alfonso E. Romero, Alberto Paccanaro, Haixuan Yang, Tatyana Goldberg, Chenguang Zhao, Liisa Holm, Petri T¨or onen, Alan J. Medlar, Elaine Zosa, Itamar Borukhov, Ilya Novikov, Angela Wilkins, Olivier Lichtarge, Po-Han Chi, Wei-Cheng Tseng, Michal Linial, Peter W. Rose, Christophe Dessimoz, Vedrana Vidulin, Saso Dzeroski, Ian Sillitoe, Sayoni Das, Jonathan Gill Lees, David T. Jones, Cen Wan, Domenico Cozzetto, Rui Fa, Mateo Torres, Alex Warwick Vesztrocy, Jose Manuel Rodriguez, Michael L. Tress, Marco Frasca, Marco Notaro, Giuliano Grossi, Alessandro Petrini, Matteo Re, Giorgio Valentini, Marco Mesiti, Daniel B. Roche, Jonas Reeb, David W. Ritchie, Sabeur Aridhi, Seyed Ziaeddin Alborzi, Marie-Dominique Devignes, Da Chen Emily Koo, Richard Bonneau, Vladimir Gligorijevi´ c, Meet Barot, Hai Fang, Stefano Toppo, Enrico Lavezzo, Marco Falda, Michele Berselli, Silvio C.E. Tosatto, Marco Carraro, Damiano Piovesan, Hafeez Ur Rehman, Qizhong Mao, Shanshan Zhang, Slobodan Vucetic, Gage S. Black, Dane Jo, Erica Suh, Jonathan B. Dayton, Dallas J. Larsen, Ashton R. Omdahl, Liam J. McGuffin, Danielle A. Brackenridge, Patricia C. Babbitt, Jeffrey M. Yunes, Paolo Fontana, Feng Zhang, Shanfeng Zhu, Ronghui You, Zihan Zhang, Suyang Dai, Shuwei Yao, Weidong Tian, Renzhi Cao, Caleb Chandler, Miguel Amezola, Devon Johnson, Jia-Ming Chang, Wen-Hung Liao, Yi-Wei Liu, Stefano Pascarelli, Yotam Frank, Robert Hoehndorf, Maxat Kulmanov, Imane Boudellioua, Gianfranco Politano, Stefano Di Carlo, Alfredo Benso, Kai Hakala, Filip Ginter, Farrokh Mehryary, Suwisa Kaewphan, Jari Bj¨ orne, Hans Moen, Martti E.E. Tolvanen, Tapio Salakoski, Daisuke Kihara, Aashish Jain, Tomislav Smuc, Adrian Altenhoff, Asa Ben-Hur, Burkhard Rost, Steven E. Brenner, Christine A. Orengo, Constance J. Jeffery, Giovanni Bosco, Deborah A. Hogan, Maria J. Martin, Claire O’Donovan, Sean D. Mooney, Casey S. Greene, Predrag Radivojac, and Iddo Friedberg. The CAFA challenge reports improved protein function prediction and new functional annotations for hundreds of genes through experimental screens. Genome Biology, 20(1):244, November 2019. ISSN 1474-760X. doi: 10.1186/s13059-019-1835-8. URL https://doi.org/10.1186/s13059-019-1835-8. | ||
179 | 4 | 39 | Nair, S. et al. The dynseq browser track shows context-specific features at nucleotide resolution. Nat. Genet. 54, 1581–1583 (2022). | ||
1271 | 26 | 34 | Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc’Aurelio Ranzato, Francisco Guzm´ an, and Angela Fan. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Trans. Assoc. Comput. Linguistics, 10:522–538, 2022. | ||
1570 | 33 | 26 | Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc’Aurelio Ranzato, Francisco Guzm´ an, and Angela Fan. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Trans. Assoc. Comput. Linguistics, 10:522–538, 2022. | ||
1658 | 34 | 33 | Naman Goyal, Cynthia Gao, Vishrav Chaudhary, Peng-Jen Chen, Guillaume Wenzek, Da Ju, Sanjana Krishnan, Marc’Aurelio Ranzato, Francisco Guzm´ an, and Angela Fan. The Flores-101 evaluation benchmark for low-resource and multilingual machine translation. Trans. Assoc. Comput. Linguistics, 10:522–538, 2022. | ||
1279 | 26 | 42 | Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, and Ion Stoica. LiveCodeBench: Holistic and contamination free evaluation of large language models for code. CoRR, abs/2403.07974, 2024. | ||
1574 | 33 | 30 | Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando Solar-Lezama, Koushik Sen, and Ion Stoica. LiveCodeBench: Holistic and contamination free evaluation of large language models for code. CoRR, abs/2403.07974, 2024. | ||
1666 | 34 | 41 | Naman Jain, King Han, Alex Gu, Wen-Ding Li, Fanjia Yan, Tianjun Zhang, Sida Wang, Armando SolarLezama, Koushik Sen, and Ion Stoica. LiveCodeBench: Holistic and contamination free evaluation of large language models for code. CoRR, abs/2403.07974, 2024. | ||
1499 | 32 | 6 | Narayan, P., Prowell, T. M., Gao, J. J., Fernandes, L. L., Li, E., Jiang, X., Qiu, J., Fan, J., Song, P., Yu, J., et al. FDA approval summary: alpelisib plus fulvestrant for patients with HR-positive, HER2-negative, PIK3CA-mutated, advanced or metastatic breast cancer. Clinical Cancer Research 27, 1842–1849 (2021). | ||
1760 | 35 | 33 | Nathan Brown, Marco Fiscato, Marwin HS Segler, and Alain C Vaucher. 2019. GuacaMol: benchmarking models for de novo molecular design. Journal of chemical information and modeling 59, 3 (2019), 1096–1108. | ||
1833 | 35 | 106 | Nathan C Frey, Daniel Berenberg, Karina Zadorozhny, Joseph Kleinhenz, Julien Lafrance-Vanasse, Isidro Hotzel, Yan Wu, Stephen Ra, Richard Bonneau, Kyunghyun Cho, et al. 2023. Protein discovery with discrete walk-jump sampling. arXiv preprint arXiv:2306.12360 (2023). | ||
1286 | 26 | 49 | Nathan Lambert, Valentina Pyatkin, Jacob Daniel Morrison, Lester James Validad Miranda, Bill Yuchen Lin, Khyathi Raghavi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, and Hanna Hajishirzi. RewardBench: Evaluating reward models for language modeling. CoRR, abs/2403.13787, 2024. | ||
1673 | 34 | 48 | Nathan Lambert, Valentina Pyatkin, Jacob Daniel Morrison, Lester James Validad Miranda, Bill Yuchen Lin, Khyathi Raghavi Chandu, Nouha Dziri, Sachin Kumar, Tom Zick, Yejin Choi, Noah A. Smith, and Hanna Hajishirzi. RewardBench: Evaluating reward models for language modeling. CoRR, abs/2403.13787, 2024. | ||
873 | 20 | 1 | NCBI Resource Coordinators. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2013;41:D8–D20. | ||
758 | 15 | 22 | Neil Houlsby, Andrei Giurgiu, Stanislaw Jastrzebski, Bruna Morrone, Quentin de Laroussilhe, Andrea Gesmundo, Mona Attariyan, and Sylvain Gelly. Parameter-Efficient Transfer Learning for NLP. arXiv:1902.00751 [cs, stat], June 2019. URL http://arxiv.org/abs/1902.00751. | ||
958 | 22 | 41 | Nguyen E, Poli M, Faizi M. et al. Hyenadna: long-range genomic sequence modeling at single nucleotide resolution. Advances in Neural Information Processing Systems, 2024;36. | ||
2179 | 36 | 19 | Nguyen Quoc Khanh Le, Quang-Thai Ho, Van-Nui Nguyen, and Jung-Su Chang. Bert-promoter: An improved sequence-based predictor of dna promoter using bert pre-trained model and shap feature selection. Computational Biology and Chemistry, 99:107732, 2022. | ||
421 | 8 | 22 | Nguyen, E. et al. HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. (2023). | ||
447 | 9 | 25 | Nguyen, E. et al. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. in 37th Conference on Neural Information Processing Systems https://openreview.net/pdf?id=ubzNoJjOKj (NeurIPS, 2023). | ||
507 | 10 | 25 | Nguyen, E. et al. Hyenadna: Long-range genomic sequence modeling at single nucleotide resolution. in 37th Conference on Neural Information Processing Systems https://openreview.net/pdf?id=ubzNoJjOKj (NeurIPS, 2023). | ||
152 | 4 | 12 | Nguyen, E. et al. HyenaDNA: long-range genomic sequence modeling at single nucleotide resolution. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 43177–43201 (Curran Associates, Inc.,2023). | ||
713 | 14 | 12 | Nguyen, E. et al. HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. Preprint at https://doi.org/10.48550/arXiv.2306.15794 (2023). | ||
406 | 8 | 7 | Nguyen, E. et al. Sequence modeling and design from molecular to genome scale with Evo. Science 386, eado9336 (2024). | ||
39 | 3 | 31 | Nguyen, E., Poli, M., Durrant, M. G., Thomas, A. W., Kang, B., Sullivan, J., Ng, M. Y., Lewis, A., Patel, A., Lou, A. et al. (2024). Sequence modeling and design from molecular to genome scale with Evo. bioRxiv preprint ( 2024–02). https://www.biorxiv.org/content/10.1101/2024.02.27.582234v2. | ||
43 | 3 | 35 | Nguyen, E., Poli, M., Faizi, M., Thomas, A., Wornow, M., Birch-Sykes, C., Massaroli, S., Patel, A., Rabideau, C., Bengio, Y., Ermon, S., R´e, C., and Baccus, S. HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. In: Oh, A., Naumann, | ||
1819 | 35 | 92 | Nicholas Evans and Stephen C Levinson. 2009. The myth of language universals: Language diversity and its importance for cognitive science. Behavioral and brain sciences 32, 5 (2009), 429–448. | ||
1105 | 23 | 51 | Nicolas Scalzitti, Anne Jeannin-Girardon, Pierre Collet, Olivier Poch, and Julie D. Thompson. A benchmark study of ab initio gene prediction methods in diverse eukaryotic organisms. BMC Genomics, 21(1):293, December 2020. ISSN 1471-2164. doi:10.1186/s12864-020-6707-9. URL https://bmcgenomics.biomedcentral.com/articles/10.1186/s12864-020-6707-9. | ||
2008 | 35 | 281 | Nicole Rusk. 2018. Sequence-based prediction of variants’ effects. Nature Methods 15, (2018), 571–571. | ||
1871 | 35 | 144 | Nikita Janakarajan, Tim Erdmann, Sarath Swaminathan, Teodoro Laino, and Jannis Born. 2023. Language models in molecular discovery. arXiv preprint arXiv:2309.16235 (2023). | ||
1295 | 26 | 58 | Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M. Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetuning. In ACL (1), pp. 15991–16111. Association for Computational Linguistics, 2023. | ||
1592 | 33 | 48 | Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M. Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetuning. In ACL (1), pp. 15991–16111. Association for Computational Linguistics, 2023. | ||
1681 | 34 | 56 | Niklas Muennighoff, Thomas Wang, Lintang Sutawika, Adam Roberts, Stella Biderman, Teven Le Scao, M. Saiful Bari, Sheng Shen, Zheng Xin Yong, Hailey Schoelkopf, Xiangru Tang, Dragomir Radev, Alham Fikri Aji, Khalid Almubarak, Samuel Albanie, Zaid Alyafeai, Albert Webson, Edward Raff, and Colin Raffel. Crosslingual generalization through multitask finetuning. In ACL (1), pp. 15991–16111. Association for Computational Linguistics, 2023. | ||
2062 | 35 | 335 | Ning Wang, Jiang Bian, Yuchen Li, Xuhong Li, Shahid Mumtaz, Linghe Kong, and Haoyi Xiong. 2024. Multi-purpose RNA language modelling with motif-aware pretraining and type-guided fine-tuning. Nature Machine Intelligence (2024), 1–10. | ||
2127 | 35 | 400 | Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Jiazhang Lian, Qiang Zhang, and Huajun Chen. 2022. OntoProtein: Protein Pretraining With Gene Ontology Embedding. arXiv (Jan. 2022), 2201.11147. | ||
2128 | 35 | 401 | Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Jiazhang Lian, Qiang Zhang, and Huajun Chen. 2022. Ontoprotein: Protein pretraining with gene ontology embedding. arXiv preprint arXiv:2201.11147 (2022). | ||
1829 | 35 | 102 | Noelia Ferruz, Steffen Schmidt, and Birte Höcker. 2022. A deep unsupervised language model for protein design. bioRxiv (2022). Robert D Finn, Jaina Mistry, Benjamin Schuster-Böckler, Sam Griffiths-Jones, Volker Hollich, Timo Lassmann, Simon Moxon, Mhairi Marshall, Ajay Khanna, Richard Durbin, et al. 2006. Pfam: clans, web tools and services. Nucleic acids research 34, suppl_1 (2006), D247–D251. | ||
834 | 17 | 20 | Nori H, King N, McKinney SM et al. Capabilities of gpt-4 on medical challenge problems. arXiv, arXiv:2303.13375, 2023, preprint: not peer reviewed. | ||
166 | 4 | 26 | Notin, P. et al. ProteinGym: large-scale benchmarks for protein fitness prediction and design. In Proceedings of the Advances in Neural Information Processing Systems 37 (eds Oh, A. et al.) (NeurIPS, 2023). | ||
122 | 3 | 114 | Notin, P., Kollasch, A. W., Ritter, D., Niekerk, L. V., Paul, S., Spinner, H., Rollins, N. J., Shaw, A., Orenbuch, R., Weitzman, R., Frazer, J., Dias, M., Franceschi, D., Gal, Y., and Marks, D. S. ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design. In: Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track (2023):https://openreview.net/forum?id=URoZHqAohf. | ||
973 | 22 | 56 | Novakovsky G, Dexter N, Libbrecht MW. et al. Obtaining genetics insights from deep learning via explainable artificial intelligence. Nat Rev Genet 2023;24:125–37. 10.1038/s41576-022-00532-2. | ||
630 | 12 | 47 | Nuryana, I. et al. Codon optimization of a gene encoding DNA polymerase from Pyrococcus furiosus and its expression in Escherichia coli. J. Genet. Eng. Biotechnol. 21, 129 (2023). | ||
1155 | 24 | 38 | O. Press, N. A. Smith, and M. Lewis. Shortformer: Better language modeling using shorter inputs. arXiv preprint arXiv:2012.15832, 2020. | ||
287 | 5 | 102 | O. Schwengers, L. Jelonek, M. A. Dieckmann, S. Beyvers, J. Blom, and A. Goesmann. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb. Genom., 7 (11), Nov. 2021. | ||
2023 | 35 | 296 | Ofir Ben Shoham and Nadav Rappoport. 2023. CPLLM: Clinical Prediction with Large Language Models. arXiv:2309.11295 [cs.CL] | ||
2187 | 36 | 27 | Ofir Press, Noah A Smith, and Mike Lewis. Train short, test long: Attention with linear biases enables input length extrapolation. arXiv preprint arXiv:2108.12409, 2021. | ||
1099 | 23 | 45 | Ofir Press, Noah A. Smith, and Mike Lewis. Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation, April 2022. URL http://arxiv.org/abs/2108.12409. arXiv:2108.12409 [cs]. | ||
1475 | 31 | 26 | Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Alevs Tamchyna. Findings of the 2014 workshop on statistical machine translation. pages 12–58, 06 2014. doi:10.3115/v1/W14-3302. | ||
702 | 14 | 1 | OpenAI et al. GPT-4 Technical Report. Preprint at https://doi.org/10.48550/arXiv.2303.08774 (2024). | ||
1176 | 25 | 0 | OpenAI, :, A. Hurst, A. Lerer, A. P. Goucher, A. Perelman, A. Ramesh, A. Clark, A. Ostrow, A. Welihinda, A. Hayes, A. Radford, A. M ˛adry, A. Baker-Whitcomb, A. Beutel, A. Borzunov, A. Carney, A. Chow, A. Kirillov, A. Nichol, A. Paino, A. Renzin, A. T. Passos, A. Kirillov, A. Christakis, A. Conneau, A. Kamali, A. Jabri, A. Moyer, A. Tam, A. Crookes, A. Tootoochian, A. Tootoonchian, A. Kumar, A. Vallone, A. Karpathy, A. Braunstein, A. Cann, A. Codispoti, A. Galu, A. Kondrich, A. Tulloch, A. Mishchenko, A. Baek, A. Jiang, A. Pelisse, A. Woodford, A. Gosalia, A. Dhar, A. Pantuliano, A. Nayak, A. Oliver, B. Zoph, B. Ghorbani, B. Leimberger, B. Rossen, B. Sokolowsky, B. Wang, B. Zweig, B. Hoover, B. Samic, | ||
1197 | 25 | 21 | OpenAI, “Building an early warning system for llm-aided biological threat creation,” 2024. https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation/. | ||
1182 | 25 | 6 | OpenAI, “Gpt-4 technical report,” 2023. | ||
1183 | 25 | 7 | OpenAI, “Gpt-4v(ision) system card.” https://openai.com/index/gpt-4v-system-card/, 2023. Accessed: 2024-07-22. | ||
1177 | 25 | 1 | OpenAI, “Hello gpt-4,” 2024. | ||
1187 | 25 | 11 | OpenAI, “How the voices for chatgpt were chosen,” 2024. | ||
1193 | 25 | 17 | OpenAI, “Moderation overview,” 2024. | ||
1184 | 25 | 8 | OpenAI, “Navigating the challenges and opportunities of synthetic voices.” https://openai.com/index/navigating-the-challenges-and-opportunities-of-synthetic-voices/, 2024. Accessed: 2024-07-22. | ||
1180 | 25 | 4 | OpenAI, “Openai preparedness framework beta,” 2023. https://cdn.openai.com/openai-preparedness-framework-beta.pdf. | ||
1196 | 25 | 20 | OpenAI, “Openai usage policies,” 2023. https://openai.com/policies/usagepolicies/. | ||
1224 | 25 | 48 | OpenAI, “Paradigm: Improving patient access to clinical trials.” https://openai.com/index/paradigm/, 2024. Accessed: 2024-08-07. | ||
1226 | 25 | 50 | OpenAI, “Using gpt-4o reasoning to transform cancer care.” https://openai.com/index/color-health/, 2024. Accessed: 2024-08-07. | ||
1974 | 35 | 247 | OpenAI. 2022. Introducing ChatGPT. OpenAI Blog (November 2022). | ||
1975 | 35 | 248 | OpenAI. 2023. GPT-Technical Report. OpenAI (2023). | ||
2185 | 36 | 25 | OpenAI. Gpt-4 technical report, 2023. | ||
835 | 17 | 21 | OpenAI. GPT-4 technical report. CoRR abs/2303.08774, 2023. | ||
1379 | 28 | 31 | OpenAI. GPT4 technical report. arXiv preprint arXiv:2303.08774, 2023. | ||
1595 | 33 | 51 | OpenAI. GPT4 technical report. arXiv preprint arXiv:2303.08774, 2023. | ||
1297 | 26 | 60 | OpenAI. GPT4 technical report. CoRR, abs/2303.08774, 2023. | ||
1683 | 34 | 58 | OpenAI. GPT4 technical report. CoRR, abs/2303.08774, 2023. | ||
1596 | 33 | 52 | OpenAI. Hello GPT-4o, 2024. URL https://openai.com/index/hello-gpt-4o/. | ||
564 | 11 | 22 | OpenAI. Hello GPT-4o, 2024a. URL https://openai.com/index/hello-gpt-4o/. | ||
1298 | 26 | 61 | OpenAI. Hello GPT-4o, 2024a. URL https://openai.com/index/hello-gpt-4o/. | ||
1684 | 34 | 59 | OpenAI. Hello GPT-4o, 2024a. URL https://openai.com/index/hello-gpt-4o/. | ||
1594 | 33 | 50 | OpenAI. Introducing ChatGPT, 2022. URL https://openai.com/index/chatgpt/. | ||
566 | 11 | 24 | OpenAI. Introducing SimpleQA, 2024c. URL https://openai.com/index/introducing-simpleqa/. | ||
567 | 11 | 25 | OpenAI. Introducing SWE-bench verified we’re releasing a human-validated subset of swebench that more, 2024d. URL https://openai.com/index/introducing-swe-bench-verified/. | ||
565 | 11 | 23 | OpenAI. Learning to reason with llms, 2024b. URL https://openai.com/index/learning-to-reason-with-llms/. | ||
1299 | 26 | 62 | OpenAI. Learning to reason with LLMs, 2024b. URL https://openai.com/index/learning-to-reason-with-llms/. | ||
1685 | 34 | 60 | OpenAI. Learning to reason with LLMs, 2024b. URL https://openai.com/index/learning-to-reason-with-llms/. | ||
1597 | 33 | 53 | OpenCompass Contributors. OpenCompass: A universal evaluation platform for foundation models, 2023. URL https://github.com/open-compass/opencompass. | ||
1583 | 33 | 39 | Opher Lieber, Barak Lenz, Hofit Bata, Gal Cohen, Jhonathan Osin, Itay Dalmedigos, Erez Safahi, Shaked Meirom, Yonatan Belinkov, Shai Shalev-Shwartz, Omri Abend, Raz Alon, Tomer Asida, Amir Bergman, Roman Glozman, Michael Gokhman, Avashalom Manevich, Nir Ratner, Noam Rozen, Erez Shwartz, Mor Zusman, and Yoav Shoham. Jamba: A hybrid Transformer-Mamba language model. CoRR, abs/2403.19887, 2024. | ||
1832 | 35 | 105 | Oscar Franzén, Li-Ming Gan, and Johan LM Björkegren. 2019. PanglaoDB: a web server for exploration of mouse and human single-cell RNA sequencing data. Database 2019 (2019), baz046. | ||
1607 | 33 | 63 | Oscar Sainz, Jon Ander Campos, Iker Garc´ ıa-Ferrero, Julen Etxaniz, Oier Lopez de Lacalle, and Eneko Agirre. NLP evaluation in trouble: On the need to measure LLM data contamination for each benchmark. In EMNLP (Findings), pp. 10776–10787. Association for Computational Linguistics, 2023. | ||
868 | 19 | 7 | Ostell JM, Wheelan SJ, Kans JA. The NCBI data model. Methods Biochem Anal. 2001. https://doi.org/10.1002/0471223921.ch2. (PMID 11449725.) | ||
682 | 13 | 37 | Ouis, M. Y. and Akhloufi, M. A. Chestbiox-gen: contextual biomedical report generation from chest x-ray images using biogpt and co-attention mechanism. Frontiers in Imaging, 3:1373420, 2024. | ||
444 | 9 | 22 | Outeiral, C. & Deane, C. M. Codon language embeddings provide strong signals for protein engineering. Nat. Mach. Intell. 6, 170–179 (2024). | ||
504 | 10 | 22 | Outeiral, C. & Deane, C. M. Codon language embeddings provide strong signals for protein engineering. Nat. Mach. Intell. 6, 170–179 (2024). | ||
633 | 12 | 50 | Outeiral, C. & Deane, C. M. Codon language embeddings provide strong signals for use in protein engineering. Nat. Mach. Intell. 6, 170–179 (2024). | ||
201 | 5 | P. Camargo, S. Roux, F. Schulz, M. Babinski, Y. Xu, B. Hu, P. S. G. Chain, S. Nayfach, and N. C. Kyrpides. Identification of mobile genetic elements with genomad. Nat. Biotechnol., 42(8):1303–1312, Aug. | |||
1235 | 25 | 59 | P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord, “Think you have solved question answering? try arc, the AI2 reasoning challenge,” CoRR, vol. abs/1803.05457, 2018. | ||
1223 | 25 | 47 | P. Garcia, S. P. Ma, S. Shah, M. Smith, Y. Jeong, A. Devon-Sand, M. Tai-Seale, K. Takazawa, D. Clutter, K. Vogt, C. Lugtu, M. Rojo, S. Lin, T. Shanafelt, M. A. Pfeffer, and C. Sharp, “Artificial Intelligence–Generated Draft Replies to Patient Inbox Messages,” JAMA Network Open, vol. 7, pp. e243201–e243201, 03 2024. | ||
293 | 5 | 108 | P. J. Sullivan, J. M. Quinn, W. Wu, M. Pinese, and M. J. Cowley. SpliceVarDB: A comprehensive database of experimentally validated human splicing variants. The American Journal of Human Genetics, 111(10): 2164–2175, Oct. 2024. ISSN 0002-9297. doi: 10.1016/j.ajhg.2024.08.002. URL https://doi.org/10.1016/j.ajhg.2024.08.002. Publisher: Elsevier. | ||
237 | 5 | 50 | P. Kunzmann, T. D. Müller, M. Greil, J. H. Krumbach, J. M. Anter, D. Bauer, F. Islam, and K. Hamacher. Biotite: new tools for a versatile python bioinformatics library. BMC Bioinformatics, 24(1):236, June 2023. | ||
1147 | 24 | 30 | P. Liu, W. Yuan, J. Fu, Z. Jiang, H. Hayashi, and G. Neubig. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Computing Surveys, 55(9):1–35, 2023. | ||
263 | 5 | 76 | P. Notin, A. W. Kollasch, D. Ritter, L. van Niekerk, S. Paul, H. Spinner, N. Rollins, A. Shaw, R. Weitzman, J. Frazer, M. Dias, D. Franceschi, R. Orenbuch, Y. Gal, and D. S. Marks. ProteinGym: Large-scale benchmarks for protein design and fitness prediction. bioRxiv, page 2023.12.07.570727, 1 2023. doi:10.1101/2023.12.07.570727. URL http://biorxiv.org/content/early/2023/12/08/2023.12.07.570727.abstract. | ||
221 | 5 | 34 | P. W. Harrison, M. R. Amode, O. Austine-Orimoloye, A. G. Azov, M. Barba, I. Barnes, A. Becker, R. Bennett, A. Berry, J. Bhai, S. K. Bhurji, S. Boddu, P. R. Branco Lins, L. Brooks, S. B. Ramaraju, L. I. Campbell, M. C. Martinez, M. Charkhchi, K. Chougule, A. Cockburn, C. Davidson, N. H. De Silva, K. Dodiya, S. Donaldson, B. El Houdaigui, T. E. Naboulsi, R. Fatima, C. G. Giron, T. Genez, D. Grigoriadis, G. S. Ghattaoraya, J. G. Martinez, T. A. Gurbich, M. Hardy, Z. Hollis, T. Hourlier, T. Hunt, M. Kay, V. Kaykala, T. Le, D. Lemos, D. Lodha, D. Marques-Coelho, G. Maslen, G. A. Merino, L. P. Mirabueno, A. Mushtaq, S. N. Hossain, D. N. Ogeh, M. P. Sakthivel, A. Parker, M. Perry, I. Piližota, D. Poppleton, I. Prosovetskaia, S. Raj, J. G. Pérez-Silva, A. I. A. Salam, S. Saraf, N. Saraiva-Agostinho, D. Sheppard, S. Sinha, B. Sipos, V. Sitnik, W. Stark, E. Steed, M.-M.Suner,L.Surapaneni,K.Sutinen,F.F.Tricomi,D.Urbina-Gómez, A.Veidenberg, T.A.Walsh, D.Ware, E. Wass, N. L. Willhoft, J. Allen, J. Alvarez-Jarreta, M. Chakiachvili, B. Flint, S. Giorgetti, L. Haggerty, G. R. Ilsley, J.Keatley, J.E.Loveland, B.Moore, J.M.Mudge, G.Naamati, J.Tate, S.J.Trevanion, A.Winterbottom, A. Frankish, S. E. Hunt, F. Cunningham, S. Dyer, R. D. Finn, F. J. Martin, and A. D. Yates. Ensembl 2024. Nucleic Acids Res., 52(D1):D891–D899, Jan. 2024. | ||
1394 | 28 | 46 | P. Wang, L. Li, L. Chen, F. Song, B. Lin, Y. Cao, T. Liu, and Z. Sui. Making large language models better reasoners with alignment. arXiv preprint arXiv:2309.02144, 2023a. | ||
577 | 11 | 35 | P. Wang, L. Li, Z. Shao, R. Xu, D. Dai, Y. Li, D. Chen, Y. Wu, and Z. Sui. Math-shepherd: A label-free step-by-step verifier for llms in mathematical reasoning. arXiv preprint arXiv:2312.08935, 2023. | ||
1395 | 28 | 47 | P. Wang, L. Li, Z. Shao, R. Xu, D. Dai, Y. Li, D. Chen, Y. Wu, and Z. Sui. Math-shepherd: Verify and reinforce llms step-by-step without human annotations. CoRR, abs/2312.08935, 2023b. | ||
898 | 21 | 20 | Pafilis E. et al. (2013) The species and organisms resources for fast and accurate identification of taxonomic names in text. PLoS One, 8, e65390. | ||
1940 | 35 | 213 | Pan Lu, Swaroop Mishra, Tony Xia, Liang Qiu, Kai-Wei Chang, Song-Chun Zhu, Oyvind Tafjord, Peter Clark, and Ashwin Kalyan. 2022. Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering. arXiv:2209.09513 [cs.CL] | ||
1887 | 35 | 160 | Panagiotis Katsonis and Olivier Lichtarge. 2019. CAGI5: Objective performance assessments of predictions based on the Evolutionary Action equation. Human mutation 40, 9 (2019), 1436–1454. | ||
836 | 17 | 22 | Parisi A, Zhao Y, Fiedel N. Talm: tool augmented language models. arXiv, arXiv:2205.12255, 2022, preprint: not peer reviewed. | ||
921 | 22 | 4 | Park Y S, Lek S. Artificial Neural Networks: Multilayer Perceptron for Ecological Modeling[M]. In: Jørgensen SE, (eds.), Developments in Environmental Modeling. Netherlands: Elsevier, 2016;28: 123–40, 10.1016/B978-0-444-63623-2.00007-4. | ||
402 | 8 | 3 | Park, E. G. et al. Genomic analyses of non-coding RNAs overlapping transposable elements and its implication to human diseases. Int. J. Mol. Sci. 23, 8950 (2022). | ||
683 | 13 | 38 | Park, J., Kim, S., Yoon, B., Hyun, J., and Choi, K. M4cxr: Exploring multi-task potentials of multi-modal large language models for chest x-ray interpretation, 2024. | ||
1971 | 35 | 244 | Pascal Notin, Aaron W. Kollasch, Daniel Ritter, Lood van Niekerk, Steffanie Paul, Hansen Spinner, Nathan Rollins, Ada Shaw, Ruben Weitzman, Jonathan Frazer, Mafalda Dias, Dinko Franceschi, Rose Orenbuch, Yarin Gal, and Debora S. Marks. 2023. ProteinGym: Large-Scale Benchmarks for Protein Design and Fitness Prediction. bioRxiv (2023). | ||
1970 | 35 | 243 | Pascal Notin, Mafalda Dias, Jonathan Frazer, Javier Marchena Hurtado, Aidan N Gomez, Debora Marks, and Yarin Gal. 2022. Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162. 16990–17017. | ||
1972 | 35 | 245 | Pascal Notin, Ruben Weitzman, Debora S. Marks, and Yarin Gal. 2023. ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers. bioRxiv (2023). Scientific Large Language Models: A Survey on Biological & Chemical Domains 83 | ||
1500 | 32 | 7 | Passarelli, A., Carbone, V., Pignata, S., Mazzeo, R., Lorusso, D., Scambia, G., Canova, S., Di Palma, T., Tasca, G., Mantiero, M., et al. Alpelisib for PIK3CA-mutated advanced gynecological cancers: first clues of clinical activity. Gynecologic Oncology 183, 61–67 (2024). | ||
160 | 4 | 20 | Paten, B. et al. Genome-wide nucleotide-level mammalian ancestor reconstruction. Genome Res. 18, 1829–1843 (2008). | ||
999 | 22 | 82 | Pathak Y, Shukla PK, Tiwari A. et al. Deep transfer learning based classification model for COVID-19 disease. Ing Rech Biomed 2022;43:87–92. 10.1016/j.irbm.2020.05.003. | ||
1886 | 35 | 159 | Pavel Karpov, Guillaume Godin, and Igor V Tetko. 2019. A transformer model for retrosynthesis. In International Conference on Artificial Neural Networks. Springer, 817–830. | ||
1345 | 27 | 4 | Paving the way to efficient architectures: StripedHyena-7B, open source models offering a glimpse into a world beyond Transformers https://www.together.ai/blog/stripedhyena-7b | ||
584 | 12 | 1 | Pechmann, S. & Frydman, J. Evolutionary conservation of codon optimality reveals hidden signatures of cotranslational folding. Nat. Struct. Mol. Biol. 20, 237–243 (2013). | ||
620 | 12 | 37 | Pechmann, S., Chartron, J. W. & Frydman, J. Local slowdown of translation by nonoptimal codons promotes nascent-chain recognition by SRP in vivo. Nat. Struct. Mol. Biol. 21, 1100–1105 (2014). | ||
475 | 9 | 53 | Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). | ||
535 | 10 | 53 | Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011). | ||
1534 | 32 | 41 | Pei, Q., Wu, L., Zhu, J., Xia, Y., Xie, S., Qin, T., Liu, H., Liu, T.-Y. & Yan, R. Breaking the barriers of data scarcity in drug–target affinity prediction. Briefings in Bioinformatics 24, bbad386 (2023). | ||
684 | 13 | 39 | Pellegrini, C., Keicher, M., Ozsoy, E., and Navab, N. Radrestruct: A novel vqa benchmark and method for structured radiology reporting, 2023. | ||
685 | 13 | 40 | Pellegrini, C., Ozsoy, E., Busam, B., Navab, N., and Keicher, M. Radialog: A large vision-language model for radiology report generation and conversational assistance, 2025. | ||
757 | 15 | 21 | Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention, 2021. | ||
1466 | 31 | 17 | Pengcheng He, Xiaodong Liu, Jianfeng Gao, and Weizhu Chen. Deberta: Decoding-enhanced bert with disentangled attention. ArXiv, abs/2006.03654, 2020. | ||
1924 | 35 | 197 | Pengfei Liu, Yiming Ren, and Zhixiang Ren. 2023. GIT-Mol: A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text. arXiv:2308.06911 [cs.LG] | ||
2202 | 36 | 42 | Pengyu Zhang, Hongming Zhang, and Hao Wu. ipro-wael: a comprehensive and robust framework for identifying promoters in multiple species. Nucleic Acids Research, 50(18):10278–10289, 2022. | ||
899 | 21 | 21 | Pennington J. et al. (2014) Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar. pp. 1532–1543. Association for Computational Linguistics. https://www.aclweb.org/anthology/D14-1162. | ||
1789 | 35 | 62 | Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. 2018. Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge. arXiv:1803.05457 [cs.AI] | ||
1255 | 26 | 18 | Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? Try ARC, the AI2 reasoning challenge. CoRR, abs/1803.05457, 2018. | ||
1562 | 33 | 18 | Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? Try ARC, the AI2 reasoning challenge. CoRR, abs/1803.05457, 2018. | ||
1643 | 34 | 18 | Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? Try ARC, the AI2 reasoning challenge. CoRR, abs/1803.05457, 2018. | ||
1460 | 31 | 11 | Peter Shaw, Jakob Uszkoreit, and Ashish Vaswani. Self-attention with relative position representations. In NAACL-HLT, 2018. | ||
2005 | 35 | 278 | Peter W Rose, Bojan Beran, Chunxiao Bi, Wolfgang F Bluhm, Dimitris Dimitropoulos, David S Goodsell, Andreas Prlić, Martha Quesada, Gregory B Quinn, John D Westbrook, et al. 2010. The RCSB Protein Data Bank: redesigned web site and web services. Nucleic acids research 39, suppl_1 (2010), D392–D401. | ||
900 | 21 | 22 | Peters M.E. et al. (2018) Deep contextualized word representations. In: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA. pp. 2227–2237. Association for Computational Linguistics. https://www.aclweb.org/anthology/N18-1202. | ||
318 | 7 | 6 | Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013). | ||
686 | 13 | 41 | Pham, H. H., Nguyen, H. Q., Nguyen, H. T., Le, L. T., and Khanh, L. An accurate and explainable deep learning system improves interobserver agreement in the interpretation of chest radiograph. IEEE Access, 10:104512–104531, 2022. | ||
114 | 3 | 106 | Phan, M. H., Zehnder, T. M., Puntieri, F., Lo, B.-W., Lenhard, B., Mueller, F., Vingron, M., and Ibrahim, D. M. (2024). Conservation of regulatory elements with highly diverged sequences across large evolutionary distances. bioRxiv preprint ( 2024–05). https://www.biorxiv.org/content/10.1101/2024.05.13.590087v1. | ||
1901 | 35 | 174 | Philippe Lamesch, Tanya Z Berardini, Donghui Li, David Swarbreck, Christopher Wilks, Rajkumar Sasidharan, Robert Muller, Kate Dreher, Debbie L Alexander, Margarita Garcia-Hernandez, et al. 2012. The Arabidopsis Information Resource (TAIR): improved gene annotation and new tools. Nucleic acids research 40, D1 (2012), D1202–D1210. | ||
2015 | 35 | 288 | Philippe Schwaller, Daniel Probst, Alain C Vaucher, Vishnu H Nair, David Kreutter, Teodoro Laino, and Jean-Louis Reymond. 2021. Mapping the space of chemical reactions using attention-based neural networks. Nature machine intelligence 3, 2 (2021), 144–152. | ||
2014 | 35 | 287 | Philippe Schwaller, Teodoro Laino, Théophile Gaudin, Peter Bolgar, Christopher A Hunter, Costas Bekas, and Alpha A Lee. 2019. Molecular transformer: a model for uncertainty-calibrated chemical reaction prediction. ACS central science 5, 9 (2019), 1572–1583. | ||
1506 | 32 | 13 | Plonka, W., Stork, C., Šícho, M. & Kirchmair, J. CYPlebrity: Machine learning models for the prediction of inhibitors of cytochrome P450 enzymes. Bioorganic & medicinal chemistry 46, 116388 (2021). | ||
587 | 12 | 4 | Plotkin, J. B. & Kudla, G. Synonymous but not the same: the causes and consequences of codon bias. Nat. Rev. Genet. 12, 32–42 (2011). | ||
417 | 8 | 18 | Poetsch, A. R. KI-Modell analysiert und generiert DNA-Strukturen (SMC, 2024). | ||
1039 | 22 | 122 | Poli M, Massaroli S, Nguyen E. et al. Hyena hierarchy: towards larger convolutional language models. In: International Conference on Machine Learning. PMLR, 2023;28043–28078. | ||
104 | 3 | 96 | Poli, M., Massaroli, S., Nguyen, E., Fu, D. Y., Dao, T., Baccus, S., Bengio, Y., Ermon, S., and R´e, C. Hyena Hierarchy: Towards larger convolutional language models. In: International Conference on Machine Learning. PMLR (2023):( 28043–28078). | ||
154 | 4 | 14 | Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010). | ||
333 | 7 | 21 | Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R. & Siepel, A. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res. 20, 110–121 (2010). | ||
29 | 3 | 21 | Pollard, K. S., Hubisz, M. J., Rosenbloom, K. R., and Siepel, A. (2010). Detection of nonneutral substitution rates on mammalian phylogenies. Genome Research 20, 110–121. | ||
341 | 7 | 29 | Pott, S. & Lieb, J. D. What are super-enhancers? Nat. Genet. 47, 8–12 (2015). | ||
1484 | 31 | 35 | Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. Squad: 100,000+ questions for machine comprehension of text. pages 2383–2392, 01 2016. doi:10.18653/v1/D16-1264. | ||
782 | 15 | 46 | Pranav Rajpurkar, Robin Jia, and Percy Liang. Know what you don’t know: Unanswerable questions for squad. CoRR, abs/1806.03822, 2018. URL http://arxiv.org/abs/1806.03822. | ||
2021 | 35 | 294 | Pranav Shetty, Arunkumar Chitteth Rajan, Chris Kuenneth, Sonakshi Gupta, Lakshmi Prerana Panchumarti, Lauren Holm, Chao Zhang, and Rampi Ramprasad. 2023. A general-purpose material property data extraction pipeline from large polymer corpora using natural language processing. npj Computational Materials 9, 1 (2023), 52. | ||
1044 | 22 | 127 | Press O, Smith NA, Lewis M. Shortformer: better language modeling using shorterinputs. arXiv preprint arXiv:2012.15832, 2020. | ||
1540 | 32 | 47 | Preuer, K., Lewis, R. P., Hochreiter, S., Bender, A., Bulusu, K. C. & Klambauer, G. DeepSynergy: predicting anti-cancer drug synergy with Deep Learning. Bioinformatics 34, 1538–1546 (2018). | ||
126 | 3 | 118 | Pritchard, J. K., and Cox, N. (2002). The allelic architecture of human disease genes: common disease–common variant...or not? Human Molecular Genetics 11, 2417–2423. doi:10.1093/hmg/11.20.2417. | ||
1532 | 32 | 39 | Probst, D., Schwaller, P. & Reymond, J.-L. Reaction classification and yield prediction using the differential reaction fingerprint DRFP. Digital discovery 1, 91–97 (2022). | ||
877 | 20 | 5 | Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, Farrell C, Hart J, Landrum MJ, McGarvey KM, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42:D756–D763. | ||
1096 | 23 | 42 | Published as a conference paper at ICLR 2024 abs/1708.02182. Publisher: arXiv Version Number: 1. | ||
1420 | 30 | 3 | Pugh T.J., Bell J.L., Bruce J.P., Doherty G.J., Galvin M., Green M.F., Hunter-Zinck H., Kumari P., Lenoue-Newton M.L., Li M.M.et al. . AACR Project GENIE: 100,000 cases and beyond. Cancer Discov. 2022; 12:2044–2057. | ||
901 | 21 | 23 | Pyysalo S. et al. (2013) Distributional semantics resources for biomedical text processing. In: Proceedings of the 5th International Symposium on Languages in Biology and Medicine, Tokyo, Japan, pp. 39–43. https://academic.oup.com/bioinformatics/article/33/14/i37/3953940. | ||
1135 | 24 | 18 | Q. Fournier, G. M. Caron, and D. Aloise. A practical survey on faster and lighter transformers. ACM Computing Surveys, 2021. | ||
1138 | 24 | 21 | Q. Geng, R. Yang, and L. Zhang. A deep learning framework for enhancer prediction using word embedding and sequence generation. Biophysical Chemistry, 286:106822, 2022. | ||
814 | 17 | 0 | Q. Jin, Y. Yang, Q. Chen, and Z. Lu. Genegpt: augmenting large language models with domain tools for improved access to biomedical information. Bioinformatics, 40, 2 2024. ISSN 13674811. doi: 10.1093/BIOINFORMATICS/BTAE075. URL https://dx.doi.org/10.1093/bioinformatics/btae075. | ||
917 | 22 | 0 | Q. Li, Z. Hu, Y. Wang, L. Li, Y. Fan, I. King, G. Jia, S. Wang, L. Song, and Y. Li. Progress and opportunities of foundation models in bioinformatics. Briefings in Bioinformatics, 25:548, 9 2024. ISSN 14774054. doi: 10.1093/BIB/BBAE548. URL https://dx.doi.org/10.1093/bib/bbae548. | ||
1727 | 35 | 0 | Q. Zhang, K. Ding, T. Lyv, X. Wang, Q. Yin, Y. Zhang, J. Yu, Y. Wang, X. Li, Z. Xiang, K. Feng, X. Zhuang, Z. Wang, M. Qin, M. Zhang, J. Zhang, J. Cui, T. Huang, P. Yan, R. Xu, H. Chen, X. Li, X. Fan, H. Xing, and H. Chen. Scientific large language models: A survey on biological and chemical domains. A Survey on Biological and Chemical Domains, 1:90, 1 2024. doi:10.1145/nnnnnnn.nnnnnnn. URL https://arxiv.org/pdf/2401.14656v2. | ||
2090 | 35 | 363 | Qianqian Xie, Qingyu Chen, Aokun Chen, Cheng Peng, Yan Hu, Fongci Lin, Xueqing Peng, Jimin Huang, Jeffrey Zhang, Vipina Keloth, Xinyu Zhou, Huan He, Lucila Ohno-Machado, Yonghui Wu, Hua Xu, and Jiang Bian. 2024. Me LLaMA: Foundation Large Language Models for Medical Applications. arXiv:2402.12749 | ||
1877 | 35 | 150 | Qiao Jin, Bhuwan Dhingra, William W. Cohen, and Xinghua Lu. 2019. Probing Biomedical Embeddings from Language Models. arXiv:1904.02181 [cs.CL] | ||
1878 | 35 | 151 | Qiao Jin, Bhuwan Dhingra, Zhengping Liu, William W. Cohen, and Xinghua Lu. 2019. PubMedQA: A Dataset for Biomedical Research Question Answering. arXiv:1909.06146 [cs.CL] | ||
2112 | 35 | 385 | Qichen Ye, Junling Liu, Dading Chong, Peilin Zhou, Yining Hua, and Andrew Liu. 2023. Qilin-Med: Multi-stage Knowledge Injection Advanced Medical Large Language Model. arXiv:2310.09089 [cs.CL] | ||
837 | 17 | 23 | Qin Y, Hu S, Lin Y et al. Tool learning with foundation models. arXiv, arXiv:2304.08354, 2023, preprint: not peer reviewed. http://arxiv.org/pdf/2304.08354.pdf. | ||
1775 | 35 | 48 | Qiyuan Chen and Cheng Deng. 2023. Bioinfo-Bench: A Simple Benchmark Framework for LLM Bioinformatics Skills Evaluation. bioRxiv (2023), 2023–10. | ||
1985 | 35 | 258 | Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, and Rui Yan. 2024. 3D-MolT5: Towards Unified 3D Molecule-Text Modeling with 3D Molecular Tokenization. arXiv preprint arXiv:2406.05797 (2024). | ||
1984 | 35 | 257 | Qizhi Pei, Lijun Wu, Kaiyuan Gao, Jinhua Zhu, Yue Wang, Zun Wang, Tao Qin, and Rui Yan. 2024. Leveraging Biomolecule and Natural Language through Multi-Modal Learning: A Survey. arXiv preprint arXiv:2403.01528 (2024). | ||
1983 | 35 | 256 | Qizhi Pei, Lijun Wu, Kaiyuan Gao, Xiaozhuan Liang, Yin Fang, Jinhua Zhu, Shufang Xie, Tao Qin, and Rui Yan. 2024. Biot5+: Towards generalized biological understanding with iupac integration and multi-task tuning. arXiv preprint arXiv:2402.17810 (2024). | ||
1986 | 35 | 259 | Qizhi Pei, Wei Zhang, Jinhua Zhu, Kehan Wu, Kaiyuan Gao, Lijun Wu, Yingce Xia, and Rui Yan. 2023. BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations. arXiv:2310.07276 [cs.CL] | ||
592 | 12 | 9 | Quax, T. E. F., Claassens, N. J., Söll, D. & van der Oost, J. Codon bias as a means to fine-tune gene expression. Mol. Cell 59, 149–161 (2015). | ||
385 | 7 | 73 | Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010). | ||
1305 | 26 | 68 | Qwen Team. Code with CodeQwen1.5, 2024a. URL https://qwenlm.github.io/blog/codeqwen1.5/. | ||
1690 | 34 | 65 | Qwen Team. Code with CodeQwen1.5, 2024a. URL https://qwenlm.github.io/blog/codeqwen1.5/. | ||
1600 | 33 | 56 | Qwen Team. Introducing Qwen1.5, 2024a. URL https://qwenlm.github.io/blog/qwen1.5/. | ||
1691 | 34 | 66 | Qwen Team. Introducing Qwen1.5, 2024b. URL https://qwenlm.github.io/blog/qwen1.5/. | ||
1306 | 26 | 69 | Qwen Team. Introducing Qwen1.5, 2024b. URL https://qwenlm.github.io/blog/qwen1.5/. Qwen Team. Introducing Qwen2-Math, 2024c. URL https://qwenlm.github.io/blog/qwen2-math/. | ||
1692 | 34 | 67 | Qwen Team. Introducing Qwen2-Math, 2024c. URL https://qwenlm.github.io/blog/qwen2-math/. | ||
1601 | 33 | 57 | Qwen Team. Qwen1.5-110B: The first 100B+ model of the Qwen1.5 series, 2024b. URL https://qwenlm.github.io/blog/qwen1.5-110b/. | ||
1602 | 33 | 58 | Qwen Team. Qwen1.5-MoE: Matching 7B model performance with 1/3 activated parameters, 2024c. URL https://qwenlm.github.io/blog/qwen-moe/. | ||
1307 | 26 | 70 | Qwen Team. QwQ: Reflect deeply on the boundaries of the unknown, 2024d. URL https://qwenlm.git hub.io/blog/qwq-32b-preview/. | ||
1693 | 34 | 68 | Qwen Team. QwQ: Reflect deeply on the boundaries of the unknown, 2024d. URL https://qwenlm.github.io/blog/qwq-32b-preview/. | ||
1341 | 27 | 0 | Qwen, :, A. Yang, B. Yang, B. Zhang, B. Hui, B. Zheng, B. Yu, C. Li, D. Liu, F. Huang, H. Wei, H. Lin, J. Yang, J. Tu, J. Zhang, J. Yang, J. Yang, J. Zhou, J. Lin, K. Dang, K. Lu, K. Bao, K. Yang, L. Yu, M. Li, M. Xue, P. Zhang, Q. Zhu, R. Men, R. Lin, T. Li, T. Tang, T. Xia, X. Ren, X. Ren, Y. Fan, Y. Su, Y. Zhang, Y. Wan, Y. Liu, Z. Cui, Z. Zhang, and Z. Qiu. Qwen2.5 technical report. 12 2024. URL https://arxiv.org/pdf/2412.15115. | ||
569 | 11 | 27 | Qwen. Qwen2.5: A party of foundation models, 2024b. URL https://qwenlm.github.io/blog/qwen2.5. | ||
568 | 11 | 26 | Qwen. Qwq: Reflect deeply on the boundaries of the unknown, 2024a. URL https://qwenlm.github.io/blog/qwq-32b-preview/. | ||
807 | 16 | 8 | R Core Team, editor. R: A Language and environment for statistical computing. 2018. | ||
1349 | 28 | 1 | R. Anil, S. Borgeaud, Y. Wu, J. Alayrac, J. Yu, R. Soricut, J. Schalkwyk, A. M. Dai, A. Hauth, K. Millican, D. Silver, S. Petrov, M. Johnson, I. Antonoglou, J. Schrittwieser, A. Glaese, J. Chen, E. Pitler, T. P. Lillicrap, A. Lazaridou, O. Firat, J. Molloy, M. Isard, P. R. Barham, T. Hennigan, B. Lee, F. Viola, M. Reynolds, Y. Xu, R. Doherty, E. Collins, C. Meyer, E. Rutherford, E. Moreira, K. Ayoub, M. Goel, G. Tucker, E. Piqueras, M. Krikun, I. Barr, N. Savinov, I. Danihelka, B. Roelofs, A. White, A. Andreassen, T. von Glehn, L. Yagati, M. Kazemi, L. Gonzalez, M. Khalman, J. Sygnowski, and et al. Gemini: A family of highly capable multimodal models. CoRR, abs/2312.11805, 2023. doi: 10.48550/ARXIV.2312.11805. URL https://doi.org/10.48550/arXiv.2312.11805. | ||
1121 | 24 | 4 | R. Bommasani, D. A. Hudson, E. Adeli, R. Altman, S. Arora, S. von Arx, M. S. Bernstein, J. Bohg, A. Bosselut, E. Brunskill, et al. On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021. | ||
271 | 5 | 86 | R. J. Penić, T. Vlašić, R. G. Huber, Y. Wan, and M. Šikić. RiNALMo: General-purpose RNA language models can generalize well on structure prediction tasks. arXiv, 2403.00043, 2024. | ||
1230 | 25 | 54 | R. M. Atlas and M. Dando, “The dual-use dilemma for the life sciences: Perspectives, conundrums, and global solutions,”Biosecurity and Bioterrorism: Biodefense Strategy, Practice, and Science, vol. 4, no. 3, pp. 276–286, 2006. PMID:16999588. | ||
266 | 5 | 81 | R. Overbeek, M. Fonstein, M. D’souza, G. D. Pusch, and N. Maltsev. The use of gene clusters to infer functional coupling. Proceedings of the National Academy of Sciences, 96(6):2896–2901, 1999. | ||
1384 | 28 | 36 | R. Rafailov, A. Sharma, E. Mitchell, S. Ermon, C. D. Manning, and C. Finn. Direct preference optimization: Your language model is secretly a reward model. 2023. | ||
1158 | 24 | 41 | R. Rao, J. Meier, T. Sercu, S. Ovchinnikov, and A. Rives. Transformer protein language models are unsupervised structure learners. Biorxiv, pages 2020–12, 2020. | ||
1189 | 25 | 13 | R. Shelby, S. Rismani, K. Henne, A. Moon, N. Rostamzadeh, P. Nicholas, N. Yilla, J. Gallegos, A. Smart, E. Garcia, and G. Virk, “Sociotechnical harms of algorithmic systems: Scoping a taxonomy for harm reduction,” 2023. | ||
308 | 5 | 123 | R. Zhang. DEG: a database of essential genes. Nucleic Acids Research, 32:271D–272, 1 2004. ISSN 1362-4962. doi: 10.1093/nar/gkh024. | ||
774 | 15 | 38 | Rabeeh Karimi Mahabadi, James Henderson, and Sebastian Ruder. Compacter: Efficient low-rank hypercomplex adapter layers, 2021. | ||
1947 | 35 | 220 | Rachel K. Luu and Markus J. Buehler. 2023. BioinspiredLLM: Conversational Large Language Model for the Mechanics of Biological and Bio-inspired Materials. arXiv:2309.08788 [cond-mat.mtrl-sci] | ||
838 | 17 | 24 | Radford A, Narasimhan K, Salimans T et al. Improving language understanding by generative pre-training, OpenAI Blog, 2018. | ||
839 | 17 | 25 | Radford A, Wu J, Child R et al. Language models are unsupervised multitask learners. OpenAI Blog 2019;1:9. | ||
945 | 22 | 28 | Radford A, Wu J, Child R. et al. Language models are unsupervised multitask learners. OpenAI Blog 2019;1:9. | ||
449 | 9 | 27 | Rae, J. W. et al. Scaling language models: methods, analysis & insights from training gopher. Preprint at https://arxiv.org/abs/2112.11446 (2021). | ||
509 | 10 | 27 | Rae, J. W. et al. Scaling language models: methods, analysis & insights from training gopher. Preprint at https://arxiv.org/abs/2112.11446 (2021). | ||
1989 | 35 | 262 | Rafael Josip Penić, Tin Vlašić, Roland G Huber, Yue Wan, and Mile Šikić. 2024. Rinalmo: General-purpose rna language models can generalize well on structure prediction tasks. arXiv preprint arXiv:2403.00043 (2024). | ||
1309 | 26 | 72 | Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In NeurIPS, 2023. | ||
1603 | 33 | 59 | Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In NeurIPS, 2023. | ||
1695 | 34 | 70 | Rafael Rafailov, Archit Sharma, Eric Mitchell, Christopher D. Manning, Stefano Ermon, and Chelsea Finn. Direct preference optimization: Your language model is secretly a reward model. In NeurIPS, 2023. | ||
949 | 22 | 32 | Raffel C, Shazeer N, Roberts A. et al. Exploring the limits of transfer learning with a unified text-to-text transformer. J Mach Learn Res 2020;21:1–67. | ||
81 | 3 | 73 | Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., and Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research 21, 1–67. http://jmlr.org/papers/v21/20-074.html. | ||
1513 | 32 | 20 | Raimondi, D., Simm, J., Arany, A. & Moreau, Y. A novel method for data fusion over entity-relation graphs and its application to protein–protein interaction prediction. Bioinformatics 37, 2275–2281 (2021). | ||
902 | 21 | 24 | Rajpurkar P. et al. (2016) Squad: 100,000+ questions for machine comprehension of text. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX. pp. 2383–2392. Association for Computational Linguistics. https://www.aclweb.org/anthology/D16-1264. | ||
996 | 22 | 79 | Rajpurkar P, Zhang J, Lopyrev K. et al. Squad: 100,000+ questions for machine comprehension of textarXiv preprint. arXiv preprint arXiv:1606.05250, 2016. | ||
586 | 12 | 3 | Ran, W., Kristensen, D. M. & Koonin, E. V. Coupling between protein level selection and codon usage optimization in the evolution of bacteria and archaea. MBio 5, e00956–14 (2014). | ||
614 | 12 | 31 | Ranaghan, M. J., Li, J. J., Laprise, D. M. & Garvie, C. W. Assessing optimal: inequalities in codon optimization algorithms. BMC Biol. 19, 36 (2021). | ||
932 | 22 | 15 | Rao R M, Liu J, Verkuil R. et al. MSA transformer. International Conference on Machine Learning PMLR 2021;139:8844–56. | ||
159 | 4 | 19 | Rao, R. M. et al. MSA Transformer. In Proceedings of the 38th International Conference on Machine Learning (eds Meila, M. & Zhang, T.) (PMLR, 2021). | ||
46 | 3 | 38 | Ratcliff, J. D. (2024). Transformer model generated bacteriophage genomes are compositionally distinct from natural sequences. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2024.03.19.585716v1. | ||
626 | 12 | 43 | Real, R. & Vargas, J. M. The probabilistic basis of jaccard’s index of similarity. Syst. Biol. 45, 380–385 (1996). | ||
369 | 7 | 57 | Rehm, H. L. et al. ClinGen—the Clinical Genome Resource. N. Engl. J. Med. 372, 2235–2242 (2015). | ||
805 | 16 | 6 | Reitz K. requests. Computer software. Pypi; 2013. | ||
2091 | 35 | 364 | Renchunzi Xie, Hongxin Wei, Lei Feng, and Bo An. 2022. Gearnet: Stepwise dual learning for weakly supervised domain adaptation. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 8717–8725. | ||
1809 | 35 | 82 | René Dreos, Giovanna Ambrosini, Rouayda Cavin Périer, and Philipp Bucher. 2013. EPD and EPDnew, high-quality promoter resources in the next-generation sequencing era. Nucleic acids research 41, D1 (2013), D157–D164. | ||
2171 | 36 | 11 | René Dreos, Giovanna Ambrosini, Rouayda Cavin Périer, and Philipp Bucher. Epd and epdnew, high-quality promoter resources in the next-generation sequencing era. Nucleic acids research, 41 (D1):D157–D164, 2013. | ||
1944 | 35 | 217 | Renqian Luo, Liai Sun, Yingce Xia, Tao Qin, Sheng Zhang, Hoifung Poon, and Tie-Yan Liu. 2022. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in Bioinformatics 23, 6 (Sept. 2022). | ||
978 | 22 | 61 | Rentzsch P, Schubach M, Shendure J. et al. CADD-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med 2021;13:1–12. 10.1186/s13073-021-00835-9. | ||
153 | 4 | 13 | Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-Splice-improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021). | ||
164 | 4 | 24 | Rentzsch, P., Witten, D., Cooper, G. M., Shendure, J. & Kircher, M. CADD: predicting the deleteriousness of variants throughout the human genome. Nucleic Acids Res. 47, D886–D894 (2019). | ||
2063 | 35 | 336 | Renxiao Wang, Xueliang Fang, Yipin Lu, Chao-Yie Yang, and Shaomeng Wang. 2005. The PDBbind database: methodologies and updates. Journal of medicinal chemistry 48, 12 (2005), 4111–4119. | ||
1003 | 22 | 86 | Repecka D, Jauniskis V, Karpus L. et al. Expanding functional protein sequence spaces using generative adversarial networks. Nat Mach Intell 2021;3:324–33. 10.1038/s42256-021-00310-5. | ||
280 | 5 | 95 | Responsible AI x Biodesign. Community values, guiding principles, and commitments for the responsible development of AI for protein design, March 8 2024. URL https://responsiblebiodesign.ai/#values-and-principles. | ||
1087 | 23 | 33 | Richard Leslie, Christopher J. O’Donnell, and Andrew D. Johnson. GRASP: analysis of geno-type–phenotype results from 1390 genome-wide association studies and corresponding open access database. Bioinformatics, 30(12):i185–i194, June 2014. ISSN 1367-4803. doi:10.1093/bioinformatics/btu273. URL https://doi.org/10.1093/bioinformatics/btu273. | ||
1483 | 31 | 34 | Richard Socher, A. Perelygin, J.Y. Wu, J. Chuang, C.D. Manning, A.Y. Ng, and C. Potts. Recursive deep models for semantic compositionality over a sentiment treebank. EMNLP, 1631:1631–1642, 01 2013. | ||
787 | 15 | 51 | Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1631–1642, Seattle, Washington, USA, October 2013. Association for Computational Linguistics. URL https://aclanthology.org/D13-1170. | ||
2024 | 35 | 297 | Richard W Shuai, Jeffrey A Ruffolo, and Jeffrey J Gray. 2021. Generative language modeling for antibody design. bioRxiv (2021), 2021–12. | ||
83 | 3 | 75 | Richard, G., de Almeida, B. P., Dalla-Torre, H., Blum, C., Hexemer, L., Pandey, P., Laurent, S., Lopez, M. P., Laterre, A., Lang, M. et al. (2024). ChatNT: A Multimodal Conversational Agent for DNA, RNA and Protein Tasks. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2024.04.30.591835v1. | ||
176 | 4 | 36 | Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015). | ||
89 | 3 | 81 | Richardson, L., Allen, B., Baldi, G., Beracochea, M., Bileschi, M. L., Burdett, T., Burgin, J., Caballero-P´erez, J., Cochrane, G., Colwell, L. J. et al. (2023). MGnify: the microbiome sequence data analysis resource in 2023. Nucleic Acids Research 51, D753–D759. | ||
1476 | 31 | 27 | Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. 08 2015. | ||
1313 | 26 | 76 | Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In ACL (1). The Association for Computer Linguistics, 2016. | ||
1699 | 34 | 74 | Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In ACL (1). The Association for Computer Linguistics, 2016. | ||
2191 | 36 | 31 | Rico Sennrich, Barry Haddow, and Alexandra Birch. Neural machine translation of rare words with subword units. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp. 1715–1725, Berlin, Germany, August 2016. Association for Computational Linguistics. doi: 10.18653/v1/P16-1162. URL https://aclanthology.org/P16-1162. Noam Shazeer. Glu variants improve transformer, 2020. | ||
416 | 8 | 17 | Riedemann, L., Labonne, M. & Gilbert, S. The path forward for large language models in medicine is open. Npj Digit. Med. 7, 1–5 (2024). | ||
18 | 3 | 10 | Riesselman, A. J., Ingraham, J. B., and Marks, D. S. (2018). Deep generative models of genetic variation capture the effects of mutations. Nature Methods 15, 816–822. | ||
2163 | 36 | 3 | Rishi Bommasani, Drew A. Hudson, Ehsan Adeli, Russ Altman, Simran Arora, Sydney von Arx, Michael S. Bernstein, Jeannette Bohg, Antoine Bosselut, Emma Brunskill, Erik Brynjolfsson, Shyamal Buch, Dallas Card, Rodrigo Castellon, Niladri Chatterji, Annie Chen, Kathleen Creel, Jared Quincy Davis, Dora Demszky, Chris Donahue, Moussa Doumbouya, Esin Durmus, Stefano Published as a conference paper at ICLR 2024 | ||
1107 | 23 | 53 | Ritambhara Singh, Jack Lanchantin, Gabriel Robins, and Yanjun Qi. DeepChrome: deep-learning for predicting gene expression from histone modifications. Bioinformatics, 32(17): i639–i648, September 2016. ISSN 1367-4803, 1367-4811. doi: 10.1093/bioinformatics/ Published as a conference paper at ICLR 2024 btw427. URL https://academic.oup.com/bioinformatics/article/32/17/i639/2450757. | ||
330 | 7 | 18 | Ritchie, G. et al. Functional annotation of noncoding sequence variants. Nat. Methods 11, 294–296 (2014). | ||
1533 | 32 | 40 | Rivera, Z. A., Tayo, L., Chen, B.-Y. & Tsai, P.-W. In silico Evaluation of the Feasibility of Magnolia officinalis Electron-shuttling Compounds as Parkinson’s Disease Remedy. Letters in Drug Design & Discovery 21, 3039–3048 (2024). | ||
1011 | 22 | 94 | Rives A, Meier J, Sercu T. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc Natl Acad Sci 2021;118:e2016239118. 10.1073/pnas.2016239118. | ||
156 | 4 | 16 | Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021). | ||
426 | 9 | 4 | Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021). | ||
486 | 10 | 4 | Rives, A. et al. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proc. Natl Acad. Sci. USA 118, e2016239118 (2021). | ||
106 | 3 | 98 | Rives, A., Meier, J., Sercu, T., Goyal, S., Lin, Z., Liu, J., Guo, D., Ott, M., Zitnick, C. L., Ma, J., and Ferguson, A. L. (2021). Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences 118, e2016239118. | ||
368 | 7 | 56 | Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015). | ||
1159 | 24 | 42 | Roadmap Epigenomics Consortium. Integrative analysis of 111 reference human epigenomes. Nature, 518(7539):317–330, 2015. | ||
1973 | 35 | 246 | Robert D Olson, Rida Assaf, Thomas Brettin, Neal Conrad, Clark Cucinell, James J Davis, Donald M Dempsey, Allan Dickerman, Emily M Dietrich, Ronald W Kenyon, et al. 2023. Introducing the bacterial and viral bioinformatics resource center (BV-BRC): a resource combining PATRIC, IRD and ViPR. Nucleic acids research 51, D1 (2023), D678–D689. | ||
2033 | 35 | 306 | Robert L Strausberg, Elise A Feingold, Richard D Klausner, and Francis S Collins. 1999. The mammalian gene collection. Science 286, 5439 (1999), 455–457. | ||
396 | 7 | 84 | Robin, X. et al. pROC: an open-source package for R and S+ to analyze and compare ROC curves. BMC Bioinformatics 12, 1–8 (2011). | ||
80 | 3 | 72 | Robson, E. S., and Ioannidis, N. M. (2023). GUANinE v1. 0: Benchmark Datasets for Genomic AI Sequence-to-Function Models. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2023.10.12.562113v3. | ||
585 | 12 | 2 | Rocha, E. P. C. Codon usage bias from tRNA’s point of view: redundancy, specialization, and efficient decoding for translation optimization. Genome Res 14, 2279–2286 (2004). | ||
468 | 9 | 46 | Rogers, A., Kovaleva, O. & Rumshisky, A. A primer in BERTology: what we know about how bert works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2020). | ||
528 | 10 | 46 | Rogers, A., Kovaleva, O. & Rumshisky, A. A primer in BERTology: what we know about how bert works. Trans. Assoc. Comput. Linguist. 8, 842–866 (2020). | ||
746 | 15 | 10 | Ronan Collobert and Jason Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of the 25th international conference on Machine learning, ICML ’08, pp. 160–167, New York, NY, USA, July 2008. Association for Computing Machinery. ISBN 978-1-60558-205-4. doi: 10.1145/1390156.1390177. URL https://doi.org/10.1145/1390156.1390177. | ||
1977 | 35 | 250 | Rose Oughtred, Chris Stark, Bobby-Joe Breitkreutz, Jennifer Rust, Lorrie Boucher, Christie Chang, Nadine Kolas, Lara O’Donnell, Genie Leung, Rochelle McAdam, et al. 2019. The BioGRID interaction database: 2019 update. Nucleic acids research 47, D1 (2019), D529–D541. | ||
2000 | 35 | 273 | Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, and Alexander Rives. 2021. MSA transformer. In International Conference on Machine Learning. PMLR, 8844–8856. | ||
1101 | 23 | 47 | Roshan Rao, Joshua Meier, Tom Sercu, Sergey Ovchinnikov, and Alexander Rives. Transformer protein language models are unsupervised structure learners. preprint, Synthetic Biology, December 2020. URL http://biorxiv.org/lookup/doi/10.1101/2020.12.15.422761. | ||
1999 | 35 | 272 | Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, and Yun S Song. 2019. Evaluating Protein Transfer Learning with TAPE. In Advances in Neural Information Processing Systems. 84 Zhang and Ding, et al. | ||
1100 | 23 | 46 | Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Xi Chen, John Canny, Pieter Abbeel, and Yun S. Song. Evaluating Protein Transfer Learning with TAPE. Advances in neural information processing systems, 32:9689–9701, December 2019. ISSN 1049-5258. URL https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7774645/. | ||
1870 | 35 | 143 | Ross Irwin, Spyridon Dimitriadis, Jiazhen He, and Esben Jannik Bjerrum. 2022. Chemformer: a pre-trained transformer for computational chemistry. Machine Learning: Science and Technology 3, 1 (2022), 015022. | ||
2042 | 35 | 315 | Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. 2022. Galactica: A Large Language Model for Science. arXiv:2211.09085 [cs.CL] | ||
2041 | 35 | 314 | Ross Taylor, Marcin Kardas, Guillem Cucurull, Thomas Scialom, Anthony Hartshorn, Elvis Saravia, Andrew Poulton, Viktor Kerkez, and Robert Stojnic. 2022. Galactica: A Large Language Model for Science. CoRR abs/2211.09085 (2022). | ||
379 | 7 | 67 | Ross, D. A., Lim, J., Lin, R.-S. & Yang, M.-H. Incremental learning for robust visual tracking. Int. J. Comput. Vision 77, 125–141 (2008). | ||
808 | 16 | 9 | Rossum GV, Drake FL. Python 3 Reference Manual. CreateSpace; 2009. | ||
729 | 14 | 28 | Rousseeuw, P. J. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics 20, 53–65 (1987). | ||
1335 | 26 | 98 | Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. HellaSwag: Can a machine really finish your sentence? In ACL (1), pp. 4791–4800. Association for Computational Linguistics, 2019. | ||
1620 | 33 | 76 | Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? In ACL (1), pp. 4791–4800. Association for Computational Linguistics, 2019. | ||
1721 | 34 | 96 | Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. HellaSwag: Can a machine really finish your sentence? In ACL (1), pp. 4791–4800. Association for Computational Linguistics, 2019. | ||
875 | 20 | 3 | Rubinstein WS, Maglott DR, Lee JM, Kattman BL, Malheiro AJ, Ovetsky M, Hem V, Gorelenkov V, Song G, Wallin C, et al. The NIH genetic testing registry: a new, centralized database of genetic tests to enable access to comprehensive information and improve transparency. Nucleic Acids Res. 2013;41:D925–D935. | ||
1007 | 22 | 90 | Ruffolo JA, Chu LS, Mahajan SP. et al. Fast, accurate antibody structure prediction from deep learning on massive set of natural antibodies. Nat Commun 2023;14:2389. 10.1038/s41467-023-38063-x. | ||
17 | 3 | 9 | Ruffolo, J. A., and Madani, A. (2024). Designing proteins with language models. Nature Biotechnology 42, 200–202. | ||
1037 | 22 | 120 | Ruiz C, Zitnik M, Leskovec J. Identification of disease treatment mechanisms through the multiscale interactome. Nat Commun 2021;12:1796. 10.1038/s41467-021-21770-8. | ||
2199 | 36 | 39 | Ruohan Wang, Zishuai Wang, Jianping Wang, and Shuaicheng Li. Splicefinder: ab initio prediction of splice sites using convolutional neural network. BMC bioinformatics, 20:1–13, 2019. | ||
129 | 3 | 121 | Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A. C., and Fei-Fei, L. (2015). ImageNet Large Scale Visual Recognition Challenge. International Journal of Computer Vision (IJCV) 115, 211–252. doi:10.1007/s11263-015-0816-y. | ||
1094 | 23 | 40 | Russell Grocock, Sean Humphray, Terena James, Zoya Kingsbury, Hans Lehrach (Principal Investigator), Ralf Sudbrak (Project Leader), Marcus W. Albrecht, Vyacheslav S. Amstislavskiy, Tatiana A. Borodina, Matthias Lienhard, Florian Mertes, Marc Sultan, Bernd Timmermann, MarieLaure Yaspo, Stephen T. Sherry (Principal Investigator), Gil A. McVean (Principal Investigator), Elaine R. Mardis (Co-Principal Investigator) (Co-Chair), Richard K. Wilson (Co-Principal Investigator), Lucinda Fulton, Robert Fulton, George M. Weinstock, Richard M. Durbin (Principal Investigator), Senduran Balasubramaniam, John Burton, Petr Danecek, Thomas M. Keane, Anja Kolb-Kokocinski, Shane McCarthy, James Stalker, Michael Quail, Jeanette P. Schmidt (Principal Investigator), Christopher J. Davies, Jeremy Gollub, Teresa Webster, Brant Wong, Yiping Zhan, Adam Auton (Principal Investigator), Richard A. Gibbs (Principal Investigator), Fuli Yu (Project Leader), Matthew Bainbridge, Danny Challis, Uday S. Evani, James Lu, Donna Muzny, Uma Nagaswamy, Jeff Reid, Aniko Sabo, Yi Wang, Jin Yu, Jun Wang (Principal Investigator), Lachlan J. M. Coin, Lin Fang, Xiaosen Guo, Xin Jin, Guoqing Li, Qibin Li, Yingrui Li, Zhenyu Li, Haoxiang Lin, Binghang Liu, Ruibang Luo, Nan Qin, Haojing Shao, Bingqiang Wang, Yinlong Xie, Chen Ye, Chang Yu, Fan Zhang, Hancheng Zheng, Hongmei Zhu, Gabor T. Marth (Principal Investigator), Erik P. Garrison, Deniz Kural, Wan-Ping Lee, Wen Fung Leong, Alistair N. Ward, Jiantao Wu, Mengyao Zhang, Charles Lee (Principal Investigator), Lauren Griffin, Chih-Heng Hsieh, Ryan E. Mills, Xinghua Shi, Marcin von Grotthuss, Chengsheng Zhang, Mark J. Daly (Principal Investigator), Mark A. DePristo (Project Leader), David M. Altshuler, Eric Banks, Gaurav Bhatia, Mauricio O. Carneiro, Guillermo del Angel, Stacey B. Gabriel, Giulio Genovese, Namrata Gupta, Robert E. Handsaker, Chris Hartl, Eric S. Lander, Steven A. McCarroll, James C. Nemesh, Ryan E. Poplin, Stephen F. Schaffner, Khalid Shakir, Seungtai C. Yoon (Principal Investigator), Jayon Lihm, Vladimir Makarov, Hanjun Jin (Principal Investigator), Wook Kim, Ki Cheol Kim, Jan O. Korbel (Principal Investigator), Tobias Rausch, Paul Flicek (Principal Investigator), Kathryn Beal, Laura Clarke, Fiona Cunningham, Javier Herrero, William M. McLaren, Graham R. S. Ritchie, Richard E. Smith, Xiangqun Zheng-Bradley, Andrew G. Clark (Principal Investigator), Srikanth Gottipati, Alon Keinan, Juan L. RodriguezFlores, Pardis C. Sabeti (Principal Investigator), Sharon R. Grossman, Shervin Tabrizi, Ridhi Tariyal, David N. Cooper (Principal Investigator), Edward V. Ball, Peter D. Stenson, David R. Bentley (Principal Investigator), Bret Barnes, Markus Bauer, R. Keira Cheetham, Tony Cox, Michael Eberle, Sean Humphray, Scott Kahn, Lisa Murray, John Peden, Richard Shaw, Kai Ye (Principal Investigator), Mark A. Batzer (Principal Investigator), Miriam K. Konkel, Jerilyn A. Walker, Daniel G. MacArthur (Principal Investigator), Monkol Lek, Sudbrak (Project Leader), Vyacheslav S. Amstislavskiy, Ralf Herwig, Mark D. Shriver (Principal Investigator), Carlos D. Bustamante (Principal Investigator), Jake K. Byrnes, Francisco M. De La Vega, Simon Gravel, Eimear E. Kenny, Jeffrey M. Kidd, Phil Lacroute, Brian K. Maples, Andres MorenoEstrada, Fouad Zakharia, Eran Halperin (Principal Investigator), Yael Baran, David W. Craig (Principal Investigator), Alexis Christoforides, Nils Homer, Tyler Izatt, Ahmet A. Kurdoglu, Shripad A. Sinari, Kevin Squire, Stephen T. Sherry (Principal Investigator), Chunlin Xiao, Jonathan Sebat (Principal Investigator), Vineet Bafna, Kenny Ye, Esteban G. Burchard (Principal Investigator), Ryan D. Hernandez (Principal Investigator), Christopher R. Gignoux, David Haussler (Principal Investigator), Sol J. Katzman, W. James Kent, Bryan Howie, Andres Ruiz-Linares (Principal Investigator), The 1000 Genomes Project Consortium, Corresponding Author, Steering committee, Production group:, Baylor College of Medicine, BGI-Shenzhen, Broad Institute of MIT and Harvard, European Bioinformatics Institute, Illumina, Max Planck Institute for Molecular Genetics, US National Institutes of Health, University of Oxford, Washington University in St Louis, Wellcome Trust Sanger Institute, Analysis group:, Affymetrix, Albert Einstein College of Medicine, Boston College, Brigham and Women’s Hospital, Cold Spring Harbor Laboratory, Dankook University, European Molecular Biology Laboratory, Cornell University, Harvard University, Human Gene Mutation Database, Leiden University Medical Center, Louisiana State University, Massachusetts General Hospital, Pennsylvania State University, Stanford University, Tel-Aviv University, Translational Genomics Research Institute, San Diego University of California, San Francisco University of California, Santa Cruz University of California, University of Chicago, University College London, and University of Geneva. An integrated map of genetic variation from 1,092 human genomes. Nature, 491(7422):56–65, November 2012. ISSN 1476-4687. doi: 10.1038/nature11632. URL https://doi.org/10.1038/nature11632. | ||
2157 | 35 | 430 | Rustam Zhumagambetov, Ferdinand Molnár, Vsevolod A Peshkov, and Siamac Fazli. 2021. Transmol: repurposing a language model for molecular generation. RSC advances 11, 42 (2021), 25921–25932. | ||
1964 | 35 | 237 | S Moller, Ulf Leser, Wolfgang Fleischmann, and Rolf Apweiler. 1999. EDITtoTrEMBL: a distributed approach to high-quality automated protein sequence annotation. Bioinformatics (Oxford, England) 15, (1999), 219–227. | ||
1210 | 25 | 34 | S. A. Athaluri, S. V. Manthena, V. S. R. K. M. Kesapragada, V. Yarlagadda, T. Dave, and R. T. S. Duddumpudi, “Exploring the boundaries of reality: Investigating the phenomenon of artificial intelligence hallucination in scientific writing through chatgpt references,” Cureus, vol. 15, no. 4, p. e37432, 2023. | ||
1206 | 25 | 30 | S. Altman, “Planning for agi and beyond,” OpenAI, 2023. | ||
224 | 5 | 37 | S. B. Hedges. The origin and evolution of model organisms. Nat. Rev. Genet., 3(11):838–849, Nov. 2002. | ||
1204 | 25 | 28 | S. B. Johnson, J. R. Clark, M. C. Luetke, N. M. Butala, A. T. Pearson, J. M. Shapiro, D. M. Aleman, J. M. Lee, M. M. Beil, C. V. Winkle, M. C. Boudreaux, R. C. D’Cunha, H. J. Krouse, and C. Li, “Chatgpt in medical education: a workshop-based large language model-powered intervention for evidence-based clinical decision making in medical students,” Nature Medicine, vol. 29, pp. 1534–1542, 2023. | ||
194 | 5 | 8 | S. Biderman, H. Schoelkopf, Q. G. Anthony, H. Bradley, K. O’Brien, E. Hallahan, M. A. Khan, S. Purohit, U. S. Prashanth, E. Raff, et al. Pythia: A suite for analyzing large language models across training and scaling. In International Conference on Machine Learning, pages 2397–2430. PMLR, 2023. | ||
197 | 5 | 11 | S. Brenner, A. Stretton, and S. Kaplan. Genetic code: the ‘nonsense’ triplets for chain termination and their suppression. Nature, 206(4988):994–998, 1965. | ||
312 | 7 | 0 | S. Chen, L. C. Francioli, J. K. Goodrich, R. L. Collins, M. Kanai, Q. Wang, J. Alföldi, N. A. Watts, C. Vittal, L. D. Gauthier, T. Poterba, M. W. Wilson, Y. Tarasova, W. Phu, R. Grant, M. T. Yohannes, Z. Koenig, Y. Farjoun, E. Banks, S. Donnelly, S. Gabriel, N. Gupta, S. Ferriera, C. Tolonen, S. Novod, L. Bergelson, D. Roazen, V. Ruano-Rubio, M. Covarrubias, C. Llanwarne, N. Petrillo, G. Wade, T. Jeandet, R. Munshi, K. Tibbetts, M. Abreu, C. A. A. Salinas, T. Ahmad, C. M. Albert, D. Ardissino, I. M. Armean, E. G. Atkinson, G. Atzmon, J. Barnard, S. M. Baxter, L. Beaugerie, E. J. Benjamin, D. Benjamin, M. Boehnke, L. L. Bonnycastle, E. P. Bottinger, D. W. Bowden, M. J. Bown, H. Brand, S. Brant, T. Brookings, S. Bryant, S. E. Calvo, H. Campos, J. C. Chambers, J. C. Chan, K. R. Chao, S. Chapman, D. I. Chasman, R. Chisholm, J. Cho, R. Chowdhury, M. K. Chung, W. K. Chung, K. Cibulskis, B. Cohen, K. M. Connolly, A. Correa, B. B. Cummings, D. Dabelea, J. Danesh, D. Darbar, P. Darnowsky, J. Denny, R. Duggirala, J. Dupuis, P. T. Ellinor, R. Elosua, J. Emery, E. England, J. Erdmann, T. Esko, E. Evangelista, D. Fatkin, J. Florez, A. Franke, J. Fu, M. Färkkilä, K. Garimella, J. Gentry, G. Getz, D. C. Glahn, B. Glaser, S. J. Glatt, D. Goldstein, C. Gonzalez, L. Groop, S. Gudmundsson, A. Haessly, C. Haiman, I. Hall, C. L. Hanis, M. Harms, M. Hiltunen, M. M. Holi, C. M. Hultman, C. Jalas, M. Kallela, D. Kaplan, J. Kaprio, S. Kathiresan, E. E. Kenny, B. J. Kim, Y. J. Kim, D. King, G. Kirov, J. Kooner, S. Koskinen, H. M. Krumholz, S. Kugathasan, S. H. Kwak, M. Laakso, N. Lake, T. Langsford, K. M. Laricchia, T. Lehtimäki, M. Lek, E. Lipscomb, R. J. Loos, W. Lu, S. A. Lubitz, T. T. Luna, R. C. Ma, G. M. Marcus, J. Marrugat, K. M. Mattila, S. McCarroll, M. I. McCarthy, J. L. McCauley, D. McGovern, R. McPherson, J. B. Meigs, O. Melander, A. Metspalu, D. Meyers, E. V. Minikel, B. D. Mitchell, V. K. Mootha, A. Naheed, S. Nazarian, P. M. Nilsson, M. C. O’Donovan, Y. Okada, D. Ongur, L. Orozco, M. J. Owen, C. Palmer, N. D. Palmer, A. Palotie, K. S. Park, C. Pato, A. E. Pulver, D. Rader, N. Rahman, A. Reiner, A. M. Remes, D. Rhodes, S. Rich, J. D. Rioux, S. Ripatti, D. M. Roden, J. I. Rotter, N. Sahakian, D. Saleheen, V. Salomaa, A. Saltzman, N. J. Samani, K. E. Samocha, A. Sanchis-Juan, J. Scharf, M. Schleicher, H. Schunkert, S. Schönherr, E. G. Seaby, S. H. Shah, M. Shand, T. Sharpe, M. B. Shoemaker, T. Shyong, E. K. Silverman, M. Singer-Berk, P. Sklar, J. T. Smith, J. G. Smith, H. Soininen, H. Sokol, R. G. Son, J. Soto, T. Spector, C. Stevens, N. O. Stitziel, P. F. Sullivan, J. Suvisaari, E. S. Tai, K. D. Taylor, Y. Y. Teo, M. Tsuang, T. Tuomi, D. Turner, T. Tusie-Luna, E. Vartiainen, M. Vawter, L. Wang, A. Wang, J. S. Ware, H. Watkins, R. K. Weersma, B. Weisburd, M. Wessman, N. Whiffin, J. G. Wilson, R. J. Xavier, A. O’Donnell-Luria, M. Solomonson, C. Seed, A. R. Martin, M. E. Talkowski, H. L. Rehm, M. J. Daly, G. Tiao, B. M. Neale, D. G. MacArthur, and K. J. Karczewski. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 2023 625:7993, 625:92–100, 12 2023. ISSN 1476-4687. doi: 10.1038/s41586-023-06045-0. URL https://www.nature.com/articles/s41586-023-06045-0. | https://qiita.com/kaizen_nagoya/items/e799ad85ee98bb2a8cf6 | |
205 | 5 | 18 | S. Chen, S. Wong, L. Chen, and Y. Tian. Extending context window of large language models via positional interpolation. URL https://arxiv. org/abs/2306.15595, 2023. | ||
1209 | 25 | 33 | S. Cox, M. Hammerling, J. Lála, J. Laurent, S. Rodriques, M. Rubashkin, and A. White, “Wikicrow: Automating synthesis of human scientific knowledge,” Future House, 2023. | ||
220 | 5 | 33 | S. Gupta, J. A. Stamatoyannopoulos, T. L. Bailey, and W. S. Noble. Quantifying similarity between motifs. Genome Biology, 8(2):R24, Feb. 2007. ISSN 1474-760X. doi: 10.1186/gb-2007-8-2-r24. URL https://doi.org/10.1186/gb-2007-8-2-r24. | ||
222 | 5 | 35 | S. Hartmann, D. Lu, J. Phillips, and T. J. Vision. Phytome: a platform for plant comparative genomics. Nucleic Acids Res., 34(Database issue):D724–30, Jan. 2006. | ||
234 | 5 | 47 | S. Knudsen. Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics (Oxford, England), 15(5):356–361, 1999. | ||
557 | 11 | 15 | S. Krishna, K. Krishna, A. Mohananey, S. Schwarcz, A. Stambler, S. Upadhyay, and M. Faruqui. Fact, fetch, and reason: A unified evaluation of retrieval-augmented generation. CoRR, abs/2409.12941, 2024. doi: 10.48550/ARXIV.2409.12941. URL https://doi.org/10.48550/arXiv.2409.12941. | ||
1190 | 25 | 14 | S. L. Blodgett, Q. V. Liao, A. Olteanu, R. Mihalcea, M. Muller, M. K. Scheuerman, C. Tan, and Q. Yang, “Responsible language technologies: Foreseeing and mitigating harms,” in Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems, CHI EA ’22, (New York, NY, USA), Association for Computing Machinery, 2022. | ||
239 | 5 | 52 | S. Li, S. Moayedpour, R. Li, M. Bailey, S. Riahi, M. Miladi, J. Miner, D. Zheng, J. Wang, A. Balsubramani, K.Tran, M.Zacharia, M.Wu, X.Gu, R.Clinton, C.Asquith, J.Skalesk, L.Boeglin, S.Chivukula, A.Dias, F.U. Montoya, V. Agarwal, Z. Bar-Joseph, and S. Jager. CodonBERT: Large language models for mRNA design and optimization. bioRxiv, 2023. doi: 10.1101/2023.09.09.556981. URL https://www.biorxiv.org/content/early/2023/09/12/2023.09.09.556981. | ||
1236 | 25 | 60 | S. Lin, J. Hilton, and O. Evans, “Truthfulqa: Measuring how models mimic human falsehoods,” CoRR, vol. abs/2109.07958, 2021. | ||
1377 | 28 | 29 | S. Mishra, M. Finlayson, P. Lu, L. Tang, S. Welleck, C. Baral, T. Rajpurohit, O. Tafjord, A. Sabharwal, P. Clark, and A. Kalyan. LILA: A unified benchmark for mathematical reasoning. In Y. Goldberg, Z. Kozareva, and Y. Zhang, editors, Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, December 7-11, 2022, pages 5807–5832. Association for Computational Linguistics,2022. doi: 10.18653/V1/2022.EMNLP-MAIN.392. URL https://doi.org/10.18653/v1/2022.emnlp-main.392. | ||
1383 | 28 | 35 | S. Polu and I. Sutskever. Generative language modeling for automated theorem proving. CoRR, abs/2009.03393, 2020. URL https://arxiv.org/abs/2009.03393. | ||
279 | 5 | 94 | S. Rajbhandari, J. Rasley, O. Ruwase, and Y. He. ZeRO: Memory optimizations toward training trillion parameter models. In SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pages 1–16. IEEE, 2020. | ||
1192 | 25 | 16 | S. Shahriar, S. Allana, S. M. Hazratifard, and R. Dara, “A survey of privacy risks and mitigation strategies in the artificial intelligence life cycle,” IEEE Access, vol. 11, pp. 61829–61854, 2023. | ||
294 | 5 | 109 | S. Sunagawa, L. P. Coelho, S. Chaffron, J. R. Kultima, K. Labadie, G. Salazar, B. Djahanschiri, G. Zeller, D. R. Mende, A. Alberti, F. M. Cornejo-Castillo, P. I. Costea, C. Cruaud, F. d’Ovidio, S. Engelen, I. Ferrera, J. M. Gasol, L. Guidi, F. Hildebrand, F. Kokoszka, C. Lepoivre, G. Lima-Mendez, J. Poulain, B. T. Poulos, M. RoyoLlonch, H. Sarmento, S. Vieira-Silva, C. Dimier, M. Picheral, S. Searson, S. Kandels-Lewis, Tara Oceans coordinators, C. Bowler, C. de Vargas, G. Gorsky, N. Grimsley, P. Hingamp, D. Iudicone, O. Jaillon, F. Not, H. Ogata, S. Pesant, S. Speich, L. Stemmann, M. B. Sullivan, J. Weissenbach, P. Wincker, E. Karsenti, J. Raes, S. G. Acinas, and P. Bork. Structure and function of the global ocean microbiome. Science, 348(6237):1261359, May 2015. | ||
1409 | 29 | 0 | S. T. Sherry, M. H. Ward, M. Kholodov, J. Baker, L. Phan, E. M. Smigielski, and K. Sirotkin. dbsnp: the ncbi database of genetic variation. Nucleic Acids Research, 29:308–311, 1 2001. ISSN 0305-1048. doi: 10.1093/NAR/29.1.308. URL https://dx.doi.org/10.1093/nar/29.1.308. | ||
1402 | 28 | 54 | S. Yao, D. Yu, J. Zhao, I. Shafran, T. L. Griffiths, Y. Cao, and K. Narasimhan. Tree of thoughts: Deliberate problem solving with large language models. arXiv preprint arXiv:2305.10601, 2023. | ||
1173 | 24 | 56 | S. Zaina, E. L. Pérez-Luque, and G. Lund. Genetics talks to epigenetics? the interplay between sequence variants and chromatin structure. Current genomics, 11(5):359–367, 2010. | ||
608 | 12 | 25 | Sabath, N., Wagner, A. & Karlin, D. Evolution of viral proteins originated de novo by overprinting. Mol. Biol. Evol. 29, 3767–3780 (2012). | ||
903 | 21 | 25 | Sachan D.S. et al. (2018) Effective use of bidirectional language modeling for transfer learning in biomedical named entity recognition. In: Finale,D.-V. et al. (eds.), Proceedings of Machine Learning Research, Palo Alto, CA, Vol. 85, pp. 383–402. PMLR. http://proceedings.mlr.press/v85/sachan18a.html. | ||
982 | 22 | 65 | Saelens W, Cannoodt R, Todorov H. et al. A comparison of single-cell trajectory inference methods. Nat Biotechnol 2019;37:547–54. 10.1038/s41587-019-0071-9. | ||
1034 | 22 | 117 | Saharia C, Chan W, Saxena S. et al. Photorealistic text-to-image diffusion models with deep language understanding. Adv Neural Inf Process Syst 2022;35:36479–94. | ||
640 | 12 | 57 | Sakoe, H. & Chiba, S. Dynamic programming algorithm optimization for spoken word recognition. IEEE Trans. Acoust. 26, 43–49 (1978). | ||
777 | 15 | 41 | Samet Oymak, Zalan Fabian, Mingchen Li, and Mahdi Soltanolkotabi. Generalization guarantees for neural networks via harnessing the low-rank structure of the jacobian. arXiv preprint arXiv:1906.05392, 2019. | ||
319 | 7 | 7 | Samocha, K. E. et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014). | ||
108 | 3 | 100 | Samuel, D. (2024). BERTs are Generative In-Context Learners. arXiv:2406.04823. https://arxiv.org/abs/2406.04823. arXiv preprint | ||
1310 | 26 | 73 | Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yuxiong He. DeepSpeed-MoE: Advancing mixture-of-experts inference and training to power next-generation AI scale. In ICML, volume 162 of Proceedings of Machine Learning Research, pp. 18332–18346. PMLR, 2022. | ||
1604 | 33 | 60 | Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yuxiong He. DeepSpeed-MoE: Advancing mixture-of-experts inference and training to power next-generation AI scale. In ICML, volume 162 of Proceedings of Machine Learning Research, pp. 18332–18346. PMLR, 2022. | ||
1696 | 34 | 71 | Samyam Rajbhandari, Conglong Li, Zhewei Yao, Minjia Zhang, Reza Yazdani Aminabadi, Ammar Ahmad Awan, Jeff Rasley, and Yuxiong He. DeepSpeed-MoE: Advancing mixture-of-experts inference and training to power next-generation AI scale. In ICML, volume 162 of Proceedings of Machine Learning Research, pp. 18332–18346. PMLR, 2022. | ||
1953 | 35 | 226 | Sanaa Mansoor, Minkyung Baek, Umesh Madan, and Eric Horvitz. 2021. Toward more general embeddings for protein design: Harnessing joint representations of sequence and structure. bioRxiv (2021), 2021–09. | ||
413 | 8 | 14 | Sanabria, M., Hirsch, J. & Poetsch, A. R. Distinguishing word identity and sequence context in DNA language models. BMC Bioinformatics 25, 301 (2024). | ||
76 | 3 | 68 | Sanabria, M., Hirsch, J., and Poetsch, A. R. (2023). The human genome’s vocabulary as proposed by the DNA language model GROVER. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2023.07.19.549677v2. | ||
414 | 8 | 15 | Sanabria, M., Hirsch, J., Joubert, P. M. & Poetsch, A. R. DNA language model GROVER learns sequence context in the human genome. Nat. Mach. Intell. 6, 911–923 (2024). | ||
1730 | 35 | 3 | Sanjar Adilov. 2021. Generative pre-training from molecules. (2021). | ||
933 | 22 | 16 | Sapoval N, Aghazadeh A, Nute MG. et al. Current progress and open challenges for applying deep learning across the biosciences. Nat Commun 2022;13:1728. 10.1038/s41467-022-29268-7. | ||
1990 | 35 | 263 | Sara Pieri, Sahal Shaji Mullappilly, Fahad Shahbaz Khan, Rao Muhammad Anwer, Salman Khan, Timothy Baldwin, and Hisham Cholakkal. 2024. BiMediX: Bilingual Medical Mixture of Experts LLM. arXiv:2402.13253 | ||
1739 | 35 | 12 | Sarp Aykent and Tian Xia. 2022. Gbpnet: Universal geometric representation learning on protein structures. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 4–14. | ||
314 | 7 | 2 | Satterstrom, F. K. et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell 180, 568–584.e523 (2020). | ||
840 | 17 | 26 | Sayers EW, Agarwala R, Bolton EE et al. Database resources of the national center for biotechnology information. Nucleic Acids Res 2019;47:D23–D28. | ||
347 | 7 | 35 | Schaefer, A. S. et al. Genetic evidence for PLASMINOGEN as a shared genetic risk factor of coronary artery disease and periodontitis. Circ. Cardiovasc. Genet. 8, 159–167 (2015). | ||
841 | 17 | 27 | Schick T, Dwivedi-Yu J, Dessì R et al. Toolformer: language models can teach themselves to use tools. arXiv, arXiv:2302.04761, 2023, preprint: not peer reviewed. | ||
32 | 3 | 24 | Schiff, Y., Kao, C.-H., Gokaslan, A., Dao, T., Gu, A., and Kuleshov, V. (2024). Caduceus: Bi-directional equivariant long-range DNA sequence modeling. arXiv preprint arXiv:2403.03234.https://arxiv.org/abs/2403.03234. | ||
687 | 13 | 42 | Schmidgall, S., Ziaei, R., Harris, C., Reis, E., Jopling, J., and Moor, M. Agentclinic: a multimodal agent benchmark to evaluate ai in simulated clinical environments, 2024. | ||
177 | 4 | 37 | Schneider, T. D. & Stephens, R. M. Sequence logos: a new way to display consensus sequences. Nucleic Acids Res. 18, 6097–6100 (1990). | ||
859 | 18 | 10 | Schoch,C.L.,Ciufo,S.,Domrachev,M.,Hotton,C.L.,Kannan,S., Khovanskaya, R.,Leipe, D.,Mcveigh, R.,O’Neill,K.,Robbertse,B., et al. (2020) NCBI Taxonomy: a comprehensive update on curation,resources and tools.Database,2020,baaa062. | ||
94 | 3 | 86 | Schoenfelder, S., and Fraser, P. (2019). Long-range enhancer–promoter contacts in gene expression control. Nature Reviews Genetics 20, 437–455. | ||
842 | 17 | 28 | Schuler G, Epstein J, Ohkawa H et al. Entrez: molecular biology database and retrieval system. Methods Enzymol 1996;266:141–62. | ||
869 | 19 | 8 | Schuler GD, Epstein JA, Ohkawa H, Kans JA. Entrez: molecular biology database and retrieval system. Methods Enzymol. 1996. https://doi.org/10.1016/s0076-6879(96)66012-1. (PMID 8743683.) | ||
1814 | 35 | 87 | Scientific Large Language Models: A Survey on Biological & Chemical Domains 77 | ||
1427 | 30 | 10 | Seal R.L., Braschi B., Gray K., Jones T.E.M., Tweedie S., Haim-Vilmovsky L., Bruford E.A. Genenames.Org: the HGNC resources in 2023. Nucleic Acids Res. 2023; 51:D1003–D1009. | ||
1762 | 35 | 35 | Sébastien Bubeck, Varun Chandrasekaran, Ronen Eldan, Johannes Gehrke, Eric Horvitz, Ece Kamar, Peter Lee, Yin Tat Lee, Yuanzhi Li, Scott Lundberg, et al. 2023. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint:2303.12712 (2023). | ||
357 | 7 | 45 | Sebat, J. et al. Strong association of de novo copy number mutations with autism. Science 316, 445–449 (2007). | ||
961 | 22 | 44 | Senior AW, Evans R, Jumper J. et al. Improved protein structure prediction using potentials from deep learning. Nature 2020;577:706–10. 10.1038/s41586-019-1923-7. | ||
110 | 3 | 102 | Sennrich, R., Haddow, B., and Birch, A. Neural machine translation of rare words with subword units. In: Erk, K., and Smith, N. A., eds. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Berlin, Germany: Association for Computational Linguistics (2016):( 1715–1725). https://aclanthology.org/P16-1162. doi:10.18653/v1/P16-1162. | ||
325 | 7 | 13 | Seplyarskiy, V. B. & Sunyaev, S. The origin of human mutation in light of genomic data. Nat. Rev. Genet. 22, 672–686 (2021). | ||
326 | 7 | 14 | Seplyarskiy, V. B. et al. Population sequencing data reveal a compendium of mutational processes in the human germ line. Science 373, 1030–1035 (2021). | ||
1344 | 27 | 3 | Sequence modeling and design from molecular to genome scale with Evo Eric Nguyen∗,1,2, Michael Poli∗,3, Matthew G. Durrant∗, Armin W. Thomas1, Brian Kang1, Jeremy Sullivan Madelena Y. Ng1, Ashley Lewis1, Aman Patel1, Aaron Lou1 Stefano Ermon1,4, Stephen A. Baccus1, Tina Hernandez-Boussard1, Christopher Ré1 Patrick D. Hsu†,2,5, and Brian L. Hie†,1,2 https://www.biorxiv.org/content/10.1101/2024.02.27.582234v1.full.pdf | ||
2054 | 35 | 327 | Serbulent Unsal, Heval Atas, Muammer Albayrak, Kemal Turhan, Aybar C Acar, and Tunca Doğan. 2022. Learning functional properties of proteins with language models. Nature Machine Intelligence 4, 3 (2022), 227–245. | ||
1784 | 35 | 57 | Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. 2020. ChemBERTa: large-scale self-supervised pretraining for molecular property prediction. arXiv preprint arXiv:2010.09885 (2020). | ||
629 | 12 | 46 | Shabalina, S. A., Spiridonov, N. A. & Kashina, A. Sounds of silence: synonymous nucleotides as a key to biological regulation and complexity. Nucleic Acids Res. 41, 2073–2094 (2013). | ||
1569 | 33 | 25 | Shahriar Golchin and Mihai Surdeanu. Time travel in llms: Tracing data contamination in large language models. In ICLR. OpenReview.net, 2024. | ||
1304 | 26 | 67 | Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, and Junyang Lin. Language models can self-lengthen to generate long texts. CoRR, abs/2410.23933, 2024. | ||
1689 | 34 | 64 | Shanghaoran Quan, Tianyi Tang, Bowen Yu, An Yang, Dayiheng Liu, Bofei Gao, Jianhong Tu, Yichang Zhang, Jingren Zhou, and Junyang Lin. Language models can self-lengthen to generate long texts. CoRR, abs/2410.23933, 2024. | ||
1851 | 35 | 124 | Shanshan He, Ruchir Bhatt, Brian Birditt, Carl Brown, Emily Brown, Kan Chantranuvatana, Patrick Danaher, Dwayne Dunaway, Brian Filanoski, Ryan G Garrison, et al. 2021. High-plex multiomic analysis in FFPE tissue at single-cellular and subcellular resolution by spatial molecular imaging. bioRxiv (2021), 2021–11. | ||
45 | 3 | 37 | Shao, B. (2023). A long-context language model for deciphering and generating bacteriophage genomes. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2023.12.18.572218v3. | ||
591 | 12 | 8 | Sharp, P. M. & Li, W. H. The codon Adaptation Index-a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987). | ||
923 | 22 | 6 | Shen J, Liu F, Tu Y. et al. Finding gene network topologies for given biological function with recurrent neural network. Nat Commun 2021;12:3125. 10.1038/s41467-021-23420-5. | ||
2064 | 35 | 337 | Sheng Wang, Yuzhi Guo, Yuhong Wang, Hongmao Sun, and Junzhou Huang. 2019. Smiles-bert: large scale unsupervised pre-training for molecular property prediction. In Proceedings of the 10th ACM international conference on bioinformatics, computational biology and health informatics. 429–436. | ||
2130 | 35 | 403 | Sheng Zhang, Xin Zhang, Hui Wang, Lixiang Guo, and Shanshan Liu. 2018. Multi-scale attentive interaction networks for chinese medical question answer selection. IEEE Access 6 (2018), 74061–74071. Scientific Large Language Models: A Survey on Biological & Chemical Domains 89 | ||
1926 | 35 | 199 | Shengchao Liu, Hanchen Wang, Weiyang Liu, Joan Lasenby, Hongyu Guo, and Jian Tang. 2021. Pre-training molecular graph representation with 3d geometry. arXiv preprint arXiv:2110.07728 (2021). | ||
1927 | 35 | 200 | Shengchao Liu, Jiongxiao Wang, Yijin Yang, Chengpeng Wang, Ling Liu, Hongyu Guo, and Chaowei Xiao. 2023. ChatGPT-powered Conversational Drug Editing Using Retrieval and Domain Feedback. arXiv:2305.18090 [q-bio.BM] | ||
1925 | 35 | 198 | Shengchao Liu, Weili Nie, Chengpeng Wang, Jiarui Lu, Zhuoran Qiao, Ling Liu, Jian Tang, Chaowei Xiao, and Anima Anandkumar. 2023. Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing. arXiv:2212.1078[cs.LG] | ||
1928 | 35 | 201 | Shengchao Liu, Yutao Zhu, Jiarui Lu, Zhao Xu, Weili Nie, Anthony Gitter, Chaowei Xiao, Jian Tang, Hongyu Guo, and Anima Anandkumar. 2023. A Text-guided Protein Design Framework. arXiv:2302.04611 [cs.LG] | ||
1277 | 26 | 40 | Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zhen Leng Thai, Kai Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, and Maosong Sun. MiniCPM: Unveiling the potential of small language models with scalable training strategies. CoRR, abs/2404.06395, 2024. | ||
1664 | 34 | 39 | Shengding Hu, Yuge Tu, Xu Han, Chaoqun He, Ganqu Cui, Xiang Long, Zhi Zheng, Yewei Fang, Yuxiang Huang, Weilin Zhao, Xinrong Zhang, Zhen Leng Thai, Kai Zhang, Chongyi Wang, Yuan Yao, Chenyang Zhao, Jie Zhou, Jie Cai, Zhongwu Zhai, Ning Ding, Chao Jia, Guoyang Zeng, Dahai Li, Zhiyuan Liu, and Maosong Sun. MiniCPM: Unveiling the potential of small language models with scalable training strategies. CoRR, abs/2404.06395, 2024. | ||
1521 | 32 | 28 | Shermukhamedov, S., Mamurjonova, D. & Probst, M. Structure to Property: Chemical Element Embeddings and a Deep Learning Approach for Accurate Prediction of Chemical Properties. arXiv preprint arXiv:2309.09355 (2023). | ||
1260 | 26 | 23 | Shihan Dou, Jiazheng Zhang, Jianxiang Zang, Yunbo Tao, Haoxiang Jia, Shichun Liu, Yuming Yang, Shenxi Wu, Shaoqing Zhang, Muling Wu, et al. Multi-programming language sandbox for llms. CoRR, abs/2410.23074, 2024. | ||
1648 | 34 | 23 | Shihan Dou, Jiazheng Zhang, Jianxiang Zang, Yunbo Tao, Haoxiang Jia, Shichun Liu, Yuming Yang, Shenxi Wu, Shaoqing Zhang, Muling Wu, et al. Multi-programming language sandbox for llms. CoRR, abs/2410.23074, 2024. | ||
688 | 13 | 43 | Shin, H. J., Han, K., Ryu, L., and Kim, E.-K. The impact of artificial intelligence on the reading times of radiologists for chest radiographs. NPJ Digital Medicine, 6(1):82, 2023. | ||
37 | 3 | 29 | Shin, J.-E., Riesselman, A. J., Kollasch, A. W., McMahon, C., Simon, E., Sander, C., Manglik, A., Kruse, A. C., and Marks, D. S. (2021). Protein design and variant prediction using autoregressive generative models. Nature Communications 12, 2403. | ||
1858 | 35 | 131 | Shion Honda, Shoi Shi, and Hiroki R Ueda. 2019. Smiles transformer: Pre-trained molecular fingerprint for low data drug discovery. arXiv preprint arXiv:1911.04738 (2019). | ||
313 | 7 | 1 | Short, P. J. et al. De novo mutations in regulatory elements in neurodevelopmental disorders. Nature 555, 611–616 (2018). | ||
118 | 3 | 110 | Shrikumar, A., Tian, K., Avsec, Z., Shcherbina, A., Banerjee, A., Sharmin, M., Nair, S., and Kundaje, A. (2018). Technical note on transcription factor motif discovery from importance scores (TF-MoDISco) version 0.5. 6.5. arXiv preprint arXiv:1811.00416. https://arxiv.org/abs/1811.00416. | ||
2177 | 36 | 17 | Shruti Khare, Céline Gurry, Lucas Freitas, Mark B Schultz, Gunter Bach, Amadou Diallo, Nancy Akite, Joses Ho, Raphael TC Lee, Winston Yeo, et al. Gisaid’s role in pandemic response. China CDC weekly, 3(49):1049, 2021. | ||
2146 | 35 | 419 | Shuangjia Zheng, Jiahua Rao, Zhongyue Zhang, Jun Xu, and Yuedong Yang. 2019. Predicting retrosynthetic reactions using self-corrected transformer neural networks. Journal of chemical information and modeling 60, (2019), 47–55. | ||
1181 | 25 | 5 | Shutterstock, “Shutterstock press release,” 2023. | ||
860 | 18 | 11 | Siddell,S.G.,Smith,D.B.,Adriaenssens,E.,Alfenas-Zerbini,P., Dutilh, B.E.,Garcia, M.L.,Junglen, S.,Krupovic, M.,Kuhn, J.H., Lambert,A.J.,et al. (2023) Virus taxonomy and the role of the International Committee on Taxonomy of Viruses (ICTV).J. Gen. Virol.,104,001840. | ||
161 | 4 | 21 | Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005). | ||
332 | 7 | 20 | Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005). | ||
28 | 3 | 20 | Siepel, A., Bejerano, G., Pedersen, J. S., Hinrichs, A. S., Hou, M., Rosenbloom, K., Clawson, H., Spieth, J., Hillier, L. W., Richards, S. et al. (2005). Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Research 15, 1034–1050. | ||
1738 | 35 | 11 | Simon Axelrod and Rafael Gomez-Bombarelli. 2022. GEOM, energy-annotated molecular conformations for property prediction and molecular generation. Scientific Data 9, 1 (2022), 185. | ||
1788 | 35 | 61 | Simon Chu and Kathy Wei. 2023. Generative Antibody Design for Complementary Chain Pairing Sequences through Encoder-Decoder Language Model. In NeurIPS 2023 Generative AI and Biology (GenBio) Workshop. Kevin Clark, Minh-Thang Luong, Quoc V Le, and Christopher D Manning. 2020. Electra: Pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555 (2020). | ||
1019 | 22 | 102 | Singh J, Hanson J, Paliwal K. et al. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nat Commun 2019;10:5407. 10.1038/s41467-019-13395-9. | ||
315 | 7 | 3 | Singh, T. et al. The contribution of rare variants to risk of schizophrenia in individuals with and without intellectual disability. Nat. Genet. 49, 1167–1173 (2017). | ||
843 | 17 | 29 | Singhal K, Azizi S, Tu T et al. Large language models encode clinical knowledge. arXiv, arXiv:2212.13138, 2022, preprint: not peer reviewed. | ||
992 | 22 | 75 | Singhal K, Azizi S, Tu T. et al. Large language models encode clinical knowledge. Nature 2023;620:172–80. 10.1038/s41586-023-06291-2. | ||
1431 | 30 | 14 | Sioutos N., Coronado S.d., Haber M.W., Hartel F.W., Shaiu W.-L., Wright L.W. NCI Thesaurus: a semantic model integrating cancer-related clinical and molecular information. J. Biomed. Inform. 2007; 40:30–43. | ||
1516 | 32 | 23 | Siramshetty, V., Williams, J., Nguyen, Ð., Neyra, J., Southall, N., Mathé, E., Xu, X. & Shah, P. Validating ADME QSAR models using marketed drugs. SLAS DISCOVERY: Advancing the Science of Drug Discovery 26, 1326–1336 (2021). | ||
1009 | 22 | 92 | Skinnider M, Johnston C, Gunabalasingam M. et al. Comprehensive prediction of secondary metabolite structure and biological activity from microbial genome sequences. Nat Commun 2020;11:6058. 10.1038/s41467-020-19986-1. | ||
167 | 4 | 27 | Smedley, D. et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016). | ||
904 | 21 | 26 | Smith L. et al. (2008) Overview of biocreative ii gene mention recognition. Genome Biol., 9, S2. | ||
858 | 18 | 9 | Smith,T.F.and Waterman,M.S.(1981) Identification of common molecular subsequences.J. Mol. Biol.,147,195–197. | ||
1424 | 30 | 7 | Sondka Z., Bamford S., Cole C.G., Ward S.A., Dunham I., Forbes S.A. The COSMIC Cancer Gene Census: describing genetic dysfunction across all human cancers. Nat. Rev. Cancer. 2018; 18:696–705. | ||
950 | 22 | 33 | Song K, Tan X, Qin T. et al. Mpnet: masked and permuted pre-training for language understanding. Adv Neural Inf Process Syst 2020;33:16857–67. | ||
113 | 3 | 105 | Song, B., Buckler, E. S., and Stitzer, M. C. (2024). New whole-genome alignment tools are needed for tapping into plant diversity. Trends in Plant Science 29, 355–369. | ||
346 | 7 | 34 | Song, C., Burgess, S., Eicher, J. D., O’Donnell, C. J. & Johnson, A. D. Causal effect of plasminogen activator inhibitor type 1 on coronary heart disease. J. Am. Heart Assoc. 6, e004918 (2017). | ||
2109 | 35 | 382 | Songhua Yang, Hanjie Zhao, Senbin Zhu, Guangyu Zhou, Hongfei Xu, Yuxiang Jia, and Hongying Zan. 2023. Zhongjing: Enhancing the Chinese Medical Capabilities of Large Language Model through Expert Feedback and Real-world Multi-turn Dialogue. arXiv:2308.03549 [cs.CL] | ||
1996 | 35 | 269 | Soumya Ram and Tristan Bepler. 2022. Few Shot Protein Generation. arXiv preprint arXiv:2204.01168 (2022). | ||
2019 | 35 | 292 | Soumya Sharma, Bishal Santra, Abhik Jana, TYSS Santosh, Niloy Ganguly, and Pawan Goyal. 2019. Incorporating domain knowledge into medical NLI using knowledge graphs. arXiv preprint arXiv:1909.00160 (2019). | ||
905 | 21 | 27 | Sousa D. et al. (2019) A silver standard corpus of human phenotype-gene relations. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN. pp. 1487–1492. Association for Computational Linguistics. https://www.aclweb.org/anthology/N19-1152. | ||
989 | 22 | 72 | Speer R, Chin J, Havasi C. Conceptnet 5.5: an open multilingual graph of general knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence 2017;31:4444–4451. 10.1609/aaai.v31i1.11164. | ||
323 | 7 | 11 | Spielmann, M. & Mundlos, S. Looking beyond the genes: the role of non-coding variants in human disease. Hum. Mol. Genet. 25, R157–R165 (2016). | ||
362 | 7 | 50 | Spielmann, M. & Mundlos, S. Structural variations, the regulatory landscape of the genome and their alteration in human disease. Bioessays 35, 533–543 (2013). | ||
361 | 7 | 49 | Spielmann, M., Lupianez, D. G. & Mundlos, S. Structural variation in the 3D genome. Nat. Rev. Genet. 19, 453–467 (2018). | ||
469 | 9 | 47 | Stärk, H., Dallago, C., Heinzinger, M. & Rost, B. Light attention predicts protein location from the language of life. Bioinform. Adv. 1, vbab035 (2021). | ||
529 | 10 | 47 | Stärk, H., Dallago, C., Heinzinger, M. & Rost, B. Light attention predicts protein location from the language of life. Bioinform. Adv. 1, vbab035 (2021). | ||
971 | 22 | 54 | Stecher G, Tamura K, Kumar S. Molecular evolutionary genetics analysis (MEGA) for macOS. Mol Biol Evol 2020;37:1237–9. 10.1093/molbev/msz312. | ||
358 | 7 | 46 | Stefansson, H. et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008). | ||
970 | 22 | 53 | Steinegger M, Meier M, Mirdita M. et al. HH-suite3 for fast remote homology detection and deep protein annotation. BMC Bioinformatics 2019;20:1–15. 10.1186/s12859-019-3019-7. | ||
353 | 7 | 41 | Stenson, P. D. et al. Human Gene Mutation Database (HGMD): 2003 update. Hum. Mutat. 21, 577–581 (2003). | ||
481 | 9 | 59 | Stenson, P. D. et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 1197–1207 (2020). | ||
541 | 10 | 59 | Stenson, P. D. et al. The Human Gene Mutation Database (HGMD®): optimizing its use in a clinical diagnostic or research setting. Hum. Genet. 139, 1197–1207 (2020). | ||
124 | 3 | 116 | Stenson, P. D., Mort, M., Ball, E. V., Evans, K., Hayden, M., Heywood, S., Hussain, M., Phillips, A. D., and Cooper, D. N. (2017). The Human Gene Mutation Database: towards a comprehensive repository of inherited mutation data for medical research, genetic diagnosis and next-generation sequencing studies. Human Genetics 136, 665–677. doi:10.1007/s00439-017-1779-6. | ||
1289 | 26 | 52 | Stephanie Lin, Jacob Hilton, and Owain Evans. TruthfulQA: Measuring how models mimic human falsehoods. In ACL (1), pp. 3214–3252. Association for Computational Linguistics, 2022a. | ||
1584 | 33 | 40 | Stephanie Lin, Jacob Hilton, and Owain Evans. TruthfulQA: Measuring how models mimic human falsehoods. In ACL (1), pp. 3214–3252. Association for Computational Linguistics, 2022a. | ||
1676 | 34 | 51 | Stephanie Lin, Jacob Hilton, and Owain Evans. TruthfulQA: Measuring how models mimic human falsehoods. In ACL (1), pp. 3214–3252. Association for Computational Linguistics, 2022a. | ||
1854 | 35 | 127 | Stephen Heller, Alan McNaught, Stephen Stein, Dmitrii Tchekhovskoi, and Igor Pletnev. 2013. InChI- the worldwide chemical structure identifier standard. Journal of Cheminformatics 5, 1 (2013), 1–9. | ||
1095 | 23 | 41 | Stephen Merity, Nitish Shirish Keskar, and Richard Socher. Regularizing and Optimizing LSTM Language Models. 2017. doi: 10.48550/ARXIV.1708.02182. URL https://arxiv.org/ | ||
1085 | 23 | 31 | Structure, Function, and Bioinformatics, 89(12):1607–1617, 2021. ISSN 1097-0134. doi: 10.1002/prot.26237. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/prot.26237. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/prot.26237. | ||
1023 | 22 | 106 | Su H, Liu M, Sun S. et al. Improving the prediction of protein-nucleic acids binding residues via multiple sequence profiles and the consensus of complementary methods. Bioinformatics 2019;35:930–6. 10.1093/bioinformatics/bty756. | ||
473 | 9 | 51 | Su, J. et al. Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024). | ||
533 | 10 | 51 | Su, J. et al. Roformer: enhanced transformer with rotary position embedding. Neurocomputing 568, 127063 (2024). | ||
100 | 3 | 92 | Su, J., Ahmed, M., Lu, Y., Pan, S., Bo, W., and Liu, Y. (2024). Roformer: Enhanced transformer with rotary position embedding. Neurocomputing 568, 127063. | ||
1847 | 35 | 120 | Suchin Gururangan, Ana Marasović, Swabha Swayamdipta, Kyle Lo, Iz Beltagy, Doug Downey, and Noah A. Smith. 2020. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. In Proceedings of ACL. | ||
155 | 4 | 15 | Sullivan, P. F. et al. Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science 380, eabn2937 (2023). | ||
92 | 3 | 84 | Sullivan, P. F., Meadows, J. R., Gazal, S., Phan, B. N., Li, X., Genereux, D. P., Dong, M. X., Bianchi, M., Andrews, G., Sakthikumar, S. et al. (2023). Leveraging base-pair mammalian constraint to understand genetic variation and human disease. Science 380, eabn2937. | ||
1733 | 35 | 6 | Sultan Alrowili and Vijay Shanker. 2021. BioM-Transformers: Building Large Biomedical Language Models with BERT, ALBERT and ELECTRA. In Proceedings of the 20th Workshop on Biomedical Language Processing. 221–227. | ||
906 | 21 | 28 | Sung N. et al. (2017) NSML: A machine learning platform that enables you to focus on your models. arXiv preprint arXiv: 1712.05902. | ||
1891 | 35 | 164 | Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. 2019. PubChem 2019 update: improved access to chemical data. Nucleic acids research 47, D1 (2019), D1102–D1109. 80 Zhang and Ding, et al. | ||
1892 | 35 | 165 | Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. 2021. PubChem in 2021: new data content and improved web interfaces. Nucleic acids research 49, D1 (2021), D1388–D1395. | ||
1893 | 35 | 166 | Sunghwan Kim, Jie Chen, Tiejun Cheng, Asta Gindulyte, Jia He, Siqian He, Qingliang Li, Benjamin A Shoemaker, Paul A Thiessen, Bo Yu, et al. 2023. PubChem 2023 update. Nucleic acids research 51, D1 (2023), D1373–D1380. | ||
1894 | 35 | 167 | Sunghwan Kim, Paul A Thiessen, Evan E Bolton, Jie Chen, Gang Fu, Asta Gindulyte, Lianyi Han, Jane He, Siqian He, Benjamin A Shoemaker, et al. 2016. PubChem substance and compound databases. Nucleic acids research 44, D1 (2016), D1202–D1213. | ||
1743 | 35 | 16 | Suryanarayanan Balaji, Rishikesh Magar, Yayati Jadhav, and Amir Barati Farimani. 2023. GPT-MolBERTa: GPT Molecular Features Language Model for molecular property prediction. arXiv:2310.03030 [physics.chem-ph] | ||
2129 | 35 | 402 | Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models. arXiv preprint arXiv:2205.01068 (2022). | ||
2142 | 35 | 415 | Suyuan Zhao, Jiahuan Zhang, Yizhen Luo, Yushuai Wu, and Zaiqing Nie. 2024. LangCell: Language-Cell Pre-training for Cell Identity Understanding. arXiv preprint arXiv:2405.06708 (2024). | ||
783 | 15 | 47 | Sylvestre-Alvise Rebuffi, Hakan Bilen, and Andrea Vedaldi. Learning multiple visual domains with residual adapters. arXiv:1705.08045 [cs, stat], November 2017. URL http://arxiv.org/abs/1705.08045. arXiv: 1705.08045. | ||
198 | 5 | 12 | T. Bricken, A. Templeton, J. Batson, B. Chen, A. Jermyn, T. Conerly, N. Turner, C. Anil, C. Denison, A. Askell, R. Lasenby, Y. Wu, S. Kravec, N. Schiefer, T. Maxwell, N. Joseph, Z. Hatfield-Dodds, A. Tamkin, K. Nguyen, B. McLean, J. E. Burke, T. Hume, S. Carter, T. Henighan, and C. Olah. Towards monosemanticity: Decomposing language models with dictionary learning. Transformer Circuits Thread, 2023. https://transformer-circuits.pub/2023/monosemantic-features/index.html. | ||
1123 | 24 | 6 | T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, et al. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020. | ||
1124 | 24 | 7 | T. Chen and C. Guestrin. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016. | ||
1357 | 28 | 9 | T. Computer. Redpajama: an open dataset for training large language models, Oct. 2023. URL https://github.com/togethercomputer/RedPajama-Data. | ||
1129 | 24 | 12 | T. Dao, D. Y. Fu, K. K. Saab, A. W. Thomas, A. Rudra, and C. Ré. Hungry hungry hippos: Towards language modeling with state space models. arXiv preprint arXiv:2212.14052, 2022b. | ||
1128 | 24 | 11 | T. Dao, D. Y. Fu, S. Ermon, A. Rudra, and C. Ré. FlashAttention: Fast and memory-efficient exact attention with IO-awareness. In Advances in Neural Information Processing Systems, 2022a. | ||
212 | 5 | 25 | T. Dobzhansky. Genetics and the Origin of Species. Columbia University Press, 1951. | ||
1207 | 25 | 31 | T. Eloundou, S. Manning, P. Mishkin, and D. Rock, “Gpts are gpts: An early look at the labor market impact potential of large language models,” arXiv preprint arXiv:2303.10130, 2023. | ||
1152 | 24 | 35 | T. H. Pham, D. H. Tran, T. B. H. Ho, K. Satou, and G. Valiente. Qualitatively predicting acetylation and methylation areas in DNA sequences. Genome Informatics, 16(2):3–11, 2005. | ||
1393 | 28 | 45 | T. H. Trinh, Y. Wu, Q. V. Le, H. He, and T. Luong. Solving olympiad geometry without human demonstrations. Nature, 625(7995):476–482, 2024. | ||
223 | 5 | 36 | T. Hayes, R. Rao, H. Akin, N. J. Sofroniew, D. Oktay, Z. Lin, R. Verkuil, V. Q. Tran, J. Deaton, M. Wiggert, et al. Simulating 500 million years of evolution with a language model. Science, eads0018, 2025. | ||
560 | 11 | 18 | T. Li, W.-L. Chiang, E. Frick, L. Dunlap, T. Wu, B. Zhu, J. E. Gonzalez, and I. Stoica. From crowdsourced data to high-quality benchmarks: Arena-hard and benchbuilder pipeline. arXiv preprint arXiv:2406.11939, 2024. | ||
288 | 5 | 103 | T. Shen, Z. Hu, S. Sun, D. Liu, F. Wong, J. Wang, J. Chen, Y. Wang, L. Hong, J. Xiao, et al. Accurate RNA 3d structure prediction using a language model-based deep learning approach. Nature Methods, pages 1–12, 2024. | ||
1178 | 25 | 2 | T. Stivers, N. J. Enfield, P. Brown, C. Englert, M. Hayashi, T. Heinemann, G. Hoymann, F. Rossano, J. P. de Ruiter, K. E. Yoon, and S. C. Levinson, “Universals and cultural variation in turn-taking in conversation,” Proceedings of the National Academy of Sciences, vol. 106, no. 26, pp. 10587–10592, 2009. | ||
1391 | 28 | 43 | T. Tao. Embracing change and resetting expectations, 2023. URL https://unlocked.microsoft.com/ai-anthology/terence-tao/. | ||
576 | 11 | 34 | T. Trinh, Y. Wu, Q. Le, H. He, and T. Luong. Solving olympiad geometry without human demonstrations. Nature, 2024. doi: 10.1038/s41586-023-06747-5. | ||
1398 | 28 | 50 | T. Wei, J. Luan, W. Liu, S. Dong, and B. Wang. Cmath: Can your language model pass chinese elementary school math test?, 2023. | ||
44 | 3 | 36 | T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., eds. Advances in Neural Information Processing Systems vol. 36. Curran Associates, Inc. (2023):( 43177–43201). | ||
218 | 5 | 31 | T.Gao, A.Wettig, H.Yen, andD.Chen. Howtotrainlong-contextlanguagemodels(effectively). arXiv preprint arXiv:2410.02660, 2024b. | ||
930 | 22 | 13 | Tack A, Piech C. The AI teacher test: measuring the pedagogical ability of blender and GPT-3 in educational dialogues. arXiv preprint arXiv:2205.07540, 2022. | ||
2178 | 36 | 18 | Taku Kudo and John Richardson. Sentencepiece: A simple and language independent subword tokenizer and detokenizer for neural text processing, 2018. | ||
391 | 7 | 79 | Taliun, D. et al. Sequencing of 53,831 diverse genomes from the NHLBI TOPMed Program. Nature 590, 290–299 (2021). | ||
117 | 3 | 109 | Talukder, A., Barham, C., Li, X., and Hu, H. (2021). Interpretation of deep learning in genomics and epigenomics. Briefings in Bioinformatics 22, bbaa177. | ||
981 | 22 | 64 | Tang Z, Li C, Kang B. et al. GEPIA: a web server for cancer and normal gene expression profiling and interactive analyses. Nucleic Acids Res 2017;45:W98–102. 10.1093/nar/gkx247. | ||
378 | 7 | 66 | Tang, W. W. et al. A unique gene regulatory network resets the human germline epigenome for development. Cell 161, 1453–1467 (2015). | ||
60 | 3 | 52 | Tang, Z., and Koo, P. K. (2024). Evaluating the representational power of pre-trained DNA language models for regulatory genomics. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2024.02.29.582810v1. | ||
1846 | 35 | 119 | Tanishq Gupta, Mohd Zaki, N. M. Anoop Krishnan, and Mausam. 2022. MatSciBERT: A Materials Domain Language Model for Text Mining and Information Extraction. NPJ Computational Materials 8, (May 2022), 102. | ||
689 | 13 | 44 | Tanno, R., Barrett, D. G., Sellergren, A., Ghaisas, S., Dathathri, S., See, A., Welbl, J., Lau, C., Tu, T., Azizi, S., et al. Collaboration between clinicians and vision–language models in radiology report generation. Nature Medicine, pp. 1–10, 2024. | ||
1333 | 26 | 96 | Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. LV-Eval: A balanced long-context benchmark with 5 length levels up to 256K. CoRR, abs/2402.05136, 2024. | ||
1618 | 33 | 74 | Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. LV-Eval: A balanced long-context benchmark with 5 length levels up to 256K. CoRR, abs/2402.05136, 2024. | ||
1719 | 34 | 94 | Tao Yuan, Xuefei Ning, Dong Zhou, Zhijie Yang, Shiyao Li, Minghui Zhuang, Zheyue Tan, Zhuyu Yao, Dahua Lin, Boxun Li, Guohao Dai, Shengen Yan, and Yu Wang. LV-Eval: A balanced long-context benchmark with 5 length levels up to 256K. CoRR, abs/2402.05136, 2024. | ||
785 | 15 | 49 | Tara N Sainath, Brian Kingsbury, Vikas Sindhwani, Ebru Arisoy, and Bhuvana Ramabhadran. Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In 2013 IEEE international conference on acoustics, speech and signal processing, pp. 6655–6659. IEEE, 2013. | ||
1422 | 30 | 5 | Tate J.G., Bamford S., Jubb H.C., Sondka Z., Beare D.M., Bindal N., Boutselakis H., Cole C.G., Creatore C., Dawson E.et al. . COSMIC: the catalogue of somatic mutations In cancer. Nucleic Acids Res. 2019; 47:D941–D947. | ||
165 | 4 | 25 | Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019). | ||
97 | 3 | 89 | Tay, Y., Dehghani, M., Gupta, J. P., Aribandi, V., Bahri, D., Qin, Z., and Metzler, D. Are pretrained convolutions better than pretrained transformers? In: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (2021):( 4349–4359). https://aclanthology.org/2021.acl-long.335/. | ||
2031 | 35 | 304 | Teague Sterling and John J Irwin. 2015. ZINC 15–ligand discovery for everyone. Journal of chemical information and modeling 55, 11 (2015), 2324–2337. | ||
297 | 5 | 112 | Team OLMo, P. Walsh, L. Soldaini, D. Groeneveld, K. Lo, S. Arora, A. Bhagia, Y. Gu, S. Huang, M. Jordan, et al. 2 OLMo 2 furious. arXiv preprint arXiv:2501.00656, 2024. | ||
810 | 16 | 11 | Tenenbaum D, Volkening J. KEGGREST. Computer software. Bioconductor Package Maintainer; 2022. | ||
398 | 7 | 86 | The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015). | ||
172 | 4 | 32 | The Dependency Map Consortium. DepMap 23Q4 public. figshare https://doi.org/10.25452/figshare.plus.24667905.v2 (2023). | ||
338 | 7 | 26 | The ENCODE Project Consortium. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020). | ||
454 | 9 | 32 | The ENCODE Project Consortium. Expanded encyclopaedias of dna elements in the human and mouse genomes. Nature 583, 699–710 (2020). | ||
514 | 10 | 32 | The ENCODE Project Consortium. Expanded encyclopaedias of dna elements in the human and mouse genomes. Nature 583, 699–710 (2020). | ||
803 | 16 | 4 | The kyoto encyclopedia of genes and genomes—kegg. Yeast. 2000;1:48–55. | ||
2196 | 36 | 36 | The Mosaic ML Team. composer. https://github.com/mosaicml/composer/, 2021. | ||
862 | 19 | 1 | The Smithsonian Online Collections Databases are provided by the National Museum of Natural History, Smithsonian Institution, 10th and Constitution Ave. N.W., Washington, DC 20560-0193. https://collections.nmnh.si.edu/. | ||
1793 | 35 | 66 | The UniProt Consortium. 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic acids research 49, D1 (2021), D480–D489. | ||
1794 | 35 | 67 | The UniProt Consortium. 2023. UniProt: the universal protein knowledgebase in 2023. Nucleic Acids Research 51, D1 (2023), D523–D531. | ||
1179 | 25 | 3 | The White House, “Fact sheet: Biden-harris administration secures voluntary commitments from leading artificial intelligence companies to manage the risks posed by ai,” 2023. | ||
934 | 22 | 17 | Theodoris CV, Xiao L, Chopra A. et al. Transfer learning enables predictions in network biology. Nature 2023;618:616–24. 10.1038/s41586-023-06139-9. | ||
1501 | 32 | 8 | Thibault, B., Thole, A., D’Angelo, R., Basset, C. & Guillermet-Guibert, J. PI3Kα-specific inhibitor BYL-719 synergizes with cisplatin in vitro in PIK3CA-mutated ovarian cancer cells. Scientific Reports 15, 6265 (2025). | ||
1849 | 35 | 122 | Thomas Hayes, Roshan Rao, Halil Akin, Nicholas J. Sofroniew, Deniz Oktay, Zeming Lin, Robert Verkuil, Vincent Q. Tran, Jonathan Deaton, Marius Wiggert, Rohil Badkundri, Irhum Shafkat, Jun Gong, Alexander Derry, Raul S. Molina, Neil Thomas, Yousuf Khan, Chetan Mishra, Carolyn Kim, Liam J. Bartie, Matthew Nemeth, Patrick D. Hsu, Tom Sercu, Salvatore Candido, and Alexander Rives. 2024. Simulating 500 million years of evolution with a language model. bioRxiv: 2024.07.01.600583 (2024). | ||
1591 | 33 | 47 | Thomas Mesnard, Cassidy Hardin, Robert Dadashi, Surya Bhupatiraju, Shreya Pathak, Laurent Sifre, Morgane Rivi` ere, Mihir Sanjay Kale, Juliette Love, Pouya Tafti, L´ eonard Hussenot, Pier Giuseppe Sessa, Aakanksha Chowdhery, Adam Roberts, Aditya Barua, Alex Botev, Alex Castro-Ros, Ambrose Slone, Am´ elie H´ eliou, Andrea Tacchetti, Anna Bulanova, Antonia Paterson, Beth Tsai, Bobak Shahriari, Charline Le Lan, Christopher A. Choquette-Choo, Cl´ ement Crepy, Daniel Cer, Daphne Ippolito, David Reid, Elena Buchatskaya, Eric Ni, Eric Noland, Geng Yan, George Tucker, George-Christian Muraru, Grigory Rozhdestvenskiy, Henryk Michalewski, Ian Tenney, Ivan Grishchenko, Jacob Austin, James Keeling, Jane Labanowski, Jean-Baptiste Lespiau, Jeff Stanway, Jenny Brennan, Jeremy Chen, Johan Ferret, Justin Chiu, Justin Mao-Jones, Katherine Lee, Kathy Yu, Katie Millican, Lars Lowe Sjoesund, Lisa Lee, Lucas Dixon, Machel Reid, Maciej Mikuła, Mateo Wirth, Michael Sharman, Nikolai Chinaev, Nithum Thain, Olivier Bachem, Oscar Chang, Oscar Wahltinez, Paige Bailey, Paul Michel, Petko Yotov, Rahma Chaabouni, Ramona Comanescu, Reena Jana, Rohan Anil, Ross McIlroy, Ruibo Liu, Ryan Mullins, Samuel L Smith, Sebastian Borgeaud, Sertan Girgin, Sholto Douglas, Shree Pandya, Siamak Shakeri, Soham De, Ted Klimenko, Tom Hennigan, Vlad Feinberg, Wojciech Stokowiec, Yu hui Chen, Zafarali Ahmed, Zhitao Gong, Tris Warkentin, Ludovic Peran, Minh Giang, Cl´ ement Farabet, Oriol Vinyals, Jeff Dean, Koray Kavukcuoglu, Demis Hassabis, Zoubin Ghahramani, Douglas Eck, Joelle Barral, Fernando Pereira, Eli Collins, Armand Joulin, Noah Fiedel, Evan Senter, Alek Andreev, and Kathleen Kenealy. Gemma: Open models based on Gemini research and technology. CoRR, abs/2403.08295, 2024. | ||
793 | 15 | 57 | Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R´ emi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6. | ||
1488 | 31 | 39 | Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6. | ||
2201 | 36 | 41 | Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pp. 38–45, Online, October 2020. Association for Computational Linguistics. URL https://www.aclweb.org/anthology/2020.emnlp-demos.6. | ||
1469 | 31 | 20 | Tian Qi Chen, Yulia Rubanova, Jesse Bettencourt, and David Duvenaud. Neural ordinary differential equations. In Samy Bengio, Hanna M. Wallach, Hugo Larochelle, Kristen Grauman, Nicolò Cesa-Bianchi, and Roman Garnett, editors, Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pages 6572–6583, 2018a. URL https://proceedings.neurips.cc/paper/2018/hash/69386f6bb1dfed68692a24c8686939b9-Abstract.html. | ||
844 | 17 | 30 | Tian S, Jin Q, Yeganova L et al. Opportunities and challenges for chatgpt and large language models in biomedicine and health. Brief Bioinform 2024;25(1). https://doi.org/10.1093/bib/bbad493. | ||
375 | 7 | 63 | Tian, Y. et al. MicroRNA-200 (miR-200) cluster regulation by achaete scute-like 2 (Ascl2): impact on the epithelial-mesenchymal transition in colon cancer cells. J. Biol. Chem. 289, 36101–36115 (2014). | ||
1288 | 26 | 51 | Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. From crowdsourced data to high-quality benchmarks: Arena-Hard and BenchBuilder pipeline. CoRR, abs/2406.11939, 2024. | ||
1582 | 33 | 38 | Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. From crowdsourced data to high-quality benchmarks: Arena-Hard and BenchBuilder pipeline. CoRR, abs/2406.11939, 2024. | ||
1675 | 34 | 50 | Tianle Li, Wei-Lin Chiang, Evan Frick, Lisa Dunlap, Tianhao Wu, Banghua Zhu, Joseph E. Gonzalez, and Ion Stoica. From crowdsourced data to high-quality benchmarks: Arena-Hard and BenchBuilder pipeline. CoRR, abs/2406.11939, 2024. | ||
2131 | 35 | 404 | Tianyi Zhang, Varsha Kishore, Felix Wu, Kilian Q Weinberger, and Yoav Artzi. 2019. Bertscore: Evaluating text generation with bert. arXiv preprint:1904.09675 (2019). | ||
1076 | 23 | 22 | Timnit Gebru, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daum´ e, and Kate Crawford. Datasheets for Datasets. 2018. doi: 10.48550/ARXIV.1803.09010. URL https://arxiv.org/abs/1803.09010. Publisher: arXiv Version Number:8. | ||
1736 | 35 | 9 | Timothy F. Truong Jr au2 and Tristan Bepler. 2023. PoET: A generative model of protein families as sequences-of-sequences. arXiv:2306.06156 [qbio.QM] | ||
743 | 15 | 7 | Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language Models are Few-Shot Learners. arXiv:2005.14165[cs], July 2020. URL http://arxiv.org/abs/2005.14165. | ||
1249 | 26 | 12 | Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. In NeurIPS, 2020. | ||
1637 | 34 | 12 | Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. Language models are few-shot learners. In NeurIPS, 2020. | ||
1761 | 35 | 34 | Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. 2020. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901. | ||
1991 | 35 | 264 | Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. 2018. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Scientific data 5, 1 (2018), 1–13. | ||
27 | 3 | 19 | Tomaz da Silva, P., Karollus, A., Hingerl, J., Galindez, G., Wagner, N., Hernandez-Alias, X., Incarnato, D., and Gagneur, J. (2024). Nucleotide dependency analysis of DNA language models reveals genomic functional elements. bioRxiv preprint ( 2024–07). https://www.biorxiv.org/content/10.1101/2024.07.27.605418v1. | ||
2092 | 35 | 365 | Tong Xie, Yuwei Wan, Wei Huang, Zhenyu Yin, Yixuan Liu, Shaozhou Wang, Qingyuan Linghu, Chunyu Kit, Clara Grazian, Wenjie Zhang, Imran Razzak, and Bram Hoex. 2023. DARWIN Series: Domain Specific Large Language Models for Natural Science. arXiv:2308.13565 [cs.CL] | ||
920 | 22 | 3 | Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat Med 2019;25:44–56. 10.1038/s41591-018-0300-7. | ||
703 | 14 | 2 | Touvron, H. et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. Preprint at https://doi.org/10.48550/arXiv.2307.09288 (2023). | ||
144 | 4 | 4 | Trajanoska, K. et al. From target discovery to clinical drug development with human genetics. Nature 620, 737–745 (2023). | ||
966 | 22 | 49 | Tran HTN, Ang KS, Chevrier M. et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biol 2020;21:1–32. 10.1186/s13059-019-1850-9. | ||
2169 | 36 | 9 | Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, and Christopher Ré. Flashattention: Fast and memory-efficient exact attention with io-awareness, 2022. | ||
1063 | 23 | 9 | Tristan Bepler and Bonnie Berger. Learning the protein language: Evolution, structure, and function. Cell Systems, 12(6):654–669.e3, June 2021. ISSN 24054712. doi: 10.1016/j.cels.2021.05.017. URL https://linkinghub.elsevier.com/retrieve/pii/S2405471221002039. | ||
65 | 3 | 57 | Trotter, M. V., Nguyen, C. Q., Young, S., Woodruff, R. T., and Branson, K. M. (2021). Epigenomic language models powered by Cerebras. arXiv preprint arXiv:2112.07571. https://arxiv.org/abs/2112.07571. | ||
15 | 3 | 7 | Truong Jr, T., and Bepler, T. PoET: A generative model of protein families as sequences-of-sequences. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., eds. Advances in Neural Information Processing Systems vol. 36. Curran Associates, Inc. (2023):( 77379–77415). https://proceedings.neurips.cc/paper_files/paper/2023/file/f4366126eba252699b280e8f93c0ab2f-Paper-Conference.pdf. | ||
907 | 21 | 29 | Tsatsaronis G. et al. (2015) An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition. BMC Bioinformatics, 16, 138. | ||
690 | 13 | 45 | Tu, T., Azizi, S., Driess, D., Schaekermann, M., Amin, M., Chang, P.-C., Carroll, A., Lau, C., Tanno, R., Ktena, I., Mustafa, B., Chowdhery, A., Liu, Y., Kornblith, S., Fleet, D., Mansfield, P., Prakash, S., Wong, R., Virmani, S., Semturs, C., Mahdavi, S. S., Green, B., Dominowska, E., y Arcas, B. A., Barral, J., Webster, D., Corrado, G. S., Matias, Y., Singhal, K., Florence, P., Karthikesalingam, A., and Natarajan, V. Towards generalist biomedical ai, 2023. | ||
2050 | 35 | 323 | Tuan Tran and Chinwe Ekenna. 2023. Molecular Descriptors Property Prediction Using Transformer-Based Approach. International Journal of Molecular Sciences 24, 15 (2023), 11948. | ||
1503 | 32 | 10 | Turon, G., Hlozek, J., Woodland, J. G., Kumar, A., Chibale, K. & Duran-Frigola, M. First fully-automated AI/ML virtual screening cascade implemented at a drug discovery centre in Africa. Nature Communications 14, 5736 (2023). | ||
1982 | 35 | 255 | Typhaine Paysan-Lafosse, Matthias Blum, Sara Chuguransky, Tiago Grego, Beatriz Lázaro Pinto, Gustavo A Salazar, Maxwell L Bileschi, Peer Bork, Alan Bridge, Lucy Colwell, et al. 2023. InterPro in 2022. Nucleic acids research 51, D1 (2023), D418–D427. | ||
936 | 22 | 19 | Uhlmann V, Donati L, Sage D. A practical guide to supervised deep learning for bioimage analysis: challenges and good practices. IEEE Signal Process Mag 2022;39:73–86. 10.1109/MSP.2021.3123589. | ||
2052 | 35 | 325 | Umit V Ucak, Islambek Ashyrmamatov, Junsu Ko, and Juyong Lee. 2022. Retrosynthetic reaction pathway prediction through neural machine translation of atomic environments. Nature communications 13, 1 (2022), 1186. 86 Zhang and Ding, et al. | ||
1795 | 35 | 68 | UniProt Consortium. 2019. UniProt: a worldwide hub of protein knowledge. Nucleic acids research 47, D1 (2019), D506–D515. | ||
600 | 12 | 17 | UniProtKB/Swiss-Prot Release 2024_04 statistics. https://web.expasy.org/docs/relnotes/relstat.html. | ||
691 | 13 | 46 | United Nations Scientific Committee on the Effects of Atomic Radiation. Sources, Effects and Risks of Ionizing Radiation: UNSCEAR 2020/2021 Report, Volume I. United Nations, New York, 2022. ISBN 978-92-1- 139206-7. | ||
1903 | 35 | 176 | Ursula K Le Guin. 2004. The Wave in the Mind: Talks and Essays on the Writer, the Reader, and the Imagination. Shambhala Publications. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. 2020. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 36, 4 (2020), 1234–1240. | ||
908 | 21 | 30 | Uzuner Ö. et al. (2011) 2010 i2b2/VA challenge on concepts, assertions, and relations in clinical text. J. Am. Med. Inform. Assoc., 18, 552–556. | ||
285 | 5 | 100 | V. A. Schneider, T. Graves-Lindsay, K. Howe, N. Bouk, H.-C. Chen, P. A. Kitts, T. D. Murphy, K. D. Pruitt, F. Thibaud-Nissen, D. Albracht, R. S. Fulton, M. Kremitzki, V. Magrini, C. Markovic, S. McGrath, K. M. Steinberg, K. Auger, W. Chow, J. Collins, G. Harden, T. Hubbard, S. Pelan, J. T. Simpson, G. Threadgold, J. Torrance, J. M. Wood, L. Clarke, S. Koren, M. Boitano, P. Peluso, H. Li, C.-S. Chin, A. M. Phillippy, R. Durbin, R. K. Wilson, P. Flicek, E. E. Eichler, and D. M. Church. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research, 27(5):849–864, May 2017. ISSN 1088-9051, 1549-5469. doi: 10.1101/gr.213611.116. URL https://genome.cshlp.org/content/27/5/849. Company: Cold Spring Harbor Laboratory Press Distributor: Cold Spring Harbor Laboratory Press Institution: Cold Spring Harbor Laboratory Press Label: Cold Spring Harbor Laboratory Press Publisher: Cold Spring Harbor Lab. | ||
1106 | 23 | 52 | Valerie A. Schneider, Tina Graves-Lindsay, Kerstin Howe, Nathan Bouk, Hsiu-Chuan Chen, Paul A. Kitts, Terence D. Murphy, Kim D. Pruitt, Franc¸oise Thibaud-Nissen, Derek Albracht, Robert S. Fulton, Milinn Kremitzki, Vincent Magrini, Chris Markovic, Sean McGrath, Karyn Meltz Steinberg, Kate Auger, William Chow, Joanna Collins, Glenn Harden, Timothy Hubbard, Sarah Pelan, Jared T. Simpson, Glen Threadgold, James Torrance, Jonathan M. Wood, Laura Clarke, Sergey Koren, Matthew Boitano, Paul Peluso, Heng Li, Chen-Shan Chin, Adam M. Phillippy, Richard Durbin, Richard K. Wilson, Paul Flicek, Evan E. Eichler, and Deanna M. Church. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Research, 27(5):849–864, May 2017. ISSN 1549-5469. doi:10.1101/gr.213611.116. | ||
1955 | 35 | 228 | Valerio Mariani, Marco Biasini, Alessandro Barbato, and Torsten Schwede. 2013. lDDT: a local superposition-free score for comparing protein structures and models using distance difference tests. Bioinformatics 29, 21 (2013), 2722–2728. | ||
909 | 21 | 31 | Van Mulligen E.M. et al. (2012) The EU-ADR corpus: annotated drugs, diseases, targets, and their relationships. J. Biomed. Inform., 45, 879–884. | ||
128 | 3 | 120 | Vapnik, V. N. The Nature of Statistical Learning Theory. New York: Springer (1999). | ||
1441 | 30 | 24 | Vaser R., Adusumalli S., Leng S.N., Sikic M., Ng P.C. SIFT missense predictions for genomes. Nat. Protoc. 2016; 11:1–9. | ||
910 | 21 | 32 | Vaswani A. et al. (2017) Attention is all you need. In: Guyon,I. et al. (eds.), Advances in Neural Information Processing Systems, pp. 5998–6008. Curran Associates, Inc. http://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf. | ||
610 | 12 | 27 | Vaswani, A. et al. Attention is all you need. https://doi.org/10.48550/ARXIV.1706.03762 (2017). | ||
150 | 4 | 10 | Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (eds Guyon, S. et al.) 6000–6010 (Curran Associates, Inc., 2017). | ||
9 | 3 | 1 | Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., and Polosukhin, I. Attention is all you need. In: Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R., eds. Advances in Neural Information Processing Systems vol. 30. Curran Associates, Inc. (2017):. | ||
1830 | 35 | 103 | Veniamin Fishman, Yuri Kuratov, Maxim Petrov, Aleksei Shmelev, Denis Shepelin, Nikolay Chekanov, Olga Kardymon, and Mikhail Burtsev. 2023. GENA-LM: A Family of Open-Source Foundational Models for Long DNA Sequences. bioRxiv (2023), 2023–06. | ||
1070 | 23 | 16 | Veniamin Fishman, Yuri Kuratov, Maxim Petrov, Aleksei Shmelev, Denis Shepelin, Nikolay Chekanov, Olga Kardymon, and Mikhail Burtsev. GENA-LM: A Family of Open-Source Foundational Models for Long DNA Sequences, June 2023. URL https://www.biorxiv.org/content/10.1101/2023.06.12.544594v1. Pages: 2023.06.12.544594 Section: New Results. | ||
1491 | 31 | 42 | Victor Junqiu Wei, Xiaozhe Ren, Xiaoguang Li, Wenyong Huang, Yi Liao, Yasheng Wang, Jiashu Lin, Xin Jiang, Xiao Chen, and Qun Liu. Nezha: Neural contextualized representation for chinese language understanding. 08 2019. | ||
798 | 15 | 62 | Victor Zhong, Caiming Xiong, and Richard Socher. Seq2sql: Generating structured queries from natural language using reinforcement learning. CoRR, abs/1709.00103, 2017. URL http://arxiv.org/abs/1709.00103. | ||
460 | 9 | 38 | Vig, J. et al. BERTology meets biology: interpreting attention in protein language models. in Proceedings of the International Conference on Learning Representations 2021 https://openreview.net/pdf?id=YWtLZvLmud7 (ICLR, 2021). | ||
520 | 10 | 38 | Vig, J. et al. BERTology meets biology: interpreting attention in protein language models. in Proceedings of the International Conference on Learning Representations 2021 https://openreview.net/pdf?id=YWtLZvLmud7 (ICLR, 2021). | ||
728 | 14 | 27 | Vinh, N. X., Epps, J. & Bailey, J. Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance. Journal of Machine Learning Research 11, 2837–2854 (2010). | ||
1952 | 35 | 225 | Vipul Mann and Venkat Venkatasubramanian. 2021. Predicting chemical reaction outcomes: A grammar ontology-based transformer framework. AIChE Journal 67, 3 (2021), e17190. | ||
1740 | 35 | 13 | Viraj Bagal, Rishal Aggarwal, PK Vinod, and U Deva Priyakumar. 2021. MolGPT: molecular generation using a transformer-decoder model. Journal of Chemical Information and Modeling 62, 9 (2021), 2064–2076. | ||
331 | 7 | 19 | Vitsios, D., Dhindsa, R. S., Middleton, L., Gussow, A. B. & Petrovski, S. Prioritizing non-coding regions based on human genomic constraint and sequence context with deep learning. Nat. Commun. 12, 1504 (2021). | ||
465 | 9 | 43 | Võsa, U. et al. Large-scale cis-and trans-eqtl analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genetics 53, 1300–1310 (2021). | ||
525 | 10 | 43 | Võsa, U. et al. Large-scale cis-and trans-eqtl analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genetics 53, 1300–1310 (2021). | ||
1522 | 32 | 29 | Vu, O., Mendenhall, J., Altarawy, D. & Meiler, J. BCL:: Mol2D—a robust atom environment descriptor for QSAR modeling and lead optimization. Journal of computer-aided molecular design 33, 477–486 (2019). | ||
311 | W. Brown. Granular format rewards for eliciting mathematical reasoning capabilities in small language models. https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb. GitHub Gist. | https://qiita.com/kaizen_nagoya/items/98ef27a02a07c4c03d6e | |||
1355 | 28 | 7 | W. Chen, X. Ma, X. Wang, and W. W. Cohen. Program of thoughts prompting: Disentangling computation from reasoning for numerical reasoning tasks. CoRR, abs/2211.12588, 2022. doi: 10.48550/ARXIV.2211.12588. URL https://doi.org/10.48550/arXiv.2211.12588. | ||
1143 | 24 | 26 | W. J. Kent, C. W. Sugnet, T. S. Furey, K. M. Roskin, T. H. Pringle, A. M. Zahler, and D. Haussler. The human genome browser at ucsc. Genome research, 12(6):996–1006, 2002. | ||
231 | 5 | 44 | W. Kabsch and C. Sander. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers: Original Research on Biomolecules, 22(12):2577–2637, 1983. | ||
1370 | 28 | 22 | W. Kwon, Z. Li, S. Zhuang, Y. Sheng, L. Zheng, C. H. Yu, J. E. Gonzalez, H. Zhang, and I. Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the ACM SIGOPS 29th Symposium on Operating Systems Principles, 2023. | ||
1408 | 28 | 60 | W. Zhong, R. Cui, Y. Guo, Y. Liang, S. Lu, Y. Wang, A. Saied, W. Chen, and N. Duan. AGIEval: A human-centric benchmark for evaluating foundation models. CoRR, abs/2304.06364, 2023. doi: 10.48550/arXiv.2304.06364. URL https://doi.org/10.48550/arXiv.2304.06364. | ||
240 | 5 | 53 | W.-W. Liang, S. Müller, S. K. Hart, H.-H. Wessels, A. Méndez-Mancilla, A. Sookdeo, O. Choi, C. M. Caragine, A. Corman, L. Lu, O. Kolumba, B. Williams, and N. E. Sanjana. Transcriptome-scale RNA-targeting CRISPR screens reveal essential lncRNAs in human cells. Cell, 187(26):7637–7654.e29, Dec. 2024. | ||
305 | 5 | 120 | W.Xiong, J.Liu, I.Molybog, H.Zhang, P.Bhargava, R.Hou, L.Martin, R.Rungta, K.A.Sankararaman, B.Oguz, et al. Effective long-context scaling of foundation models. arXiv preprint arXiv:2309.16039, 2023. | ||
1731 | 35 | 4 | Walid Ahmad, Elana Simon, Seyone Chithrananda, Gabriel Grand, and Bharath Ramsundar. 2022. Chemberta-2: Towards chemical foundation models. arXiv preprint arXiv:2209.01712 (2022). | ||
962 | 22 | 45 | Walsh B, Mohamed SK, Nováček V. Biokg: A knowledge graph for relational learning on biological data. In: d'Aquin PM, Dietze PS, (eds.), Proceedings of the 29th ACM International Conference on Information & Knowledge Management. ACM (Association for Computing Machinery), New York, NY, USA, 2020; 3173–3180. | ||
619 | 12 | 36 | Walsh, I. M., Bowman, M. A., Soto Santarriaga, I. F., Rodriguez, A. & Clark, P. L. Synonymous codon substitutions perturb cotranslational protein folding in vivo and impair cell fitness. Proc. Natl Acad. Sci. USA 117, 3528–3534 (2020). | ||
359 | 7 | 47 | Walsh, T. et al. Rare structural variants disrupt multiple genes in neurodevelopmental pathways in schizophrenia. Science 320, 539–543 (2008). | ||
911 | 21 | 33 | Wang X. et al. (2018) Cross-type biomedical named entity recognition with deep multi-task learning. Bioinformatics, 35, 1745–1752. | ||
1042 | 22 | 125 | Wang H, Kaddour J, Liu S. et al. Evaluating self-supervised learning for molecular graph embeddings. Advances in Neural Information Processing Systems, 2024;36. | ||
922 | 22 | 5 | Wang M, Tai CEW, Wei L. DeFine: deep convolutional neural networks accurately quantify intensities of transcription factor-DNA binding and facilitate evaluation of functional non-coding variants. Nucleic Acids Res 2018;46:e69–9. 10.1093/nar/gky215. | ||
1002 | 22 | 85 | Wang R, Wang Z, Wang J. et al. SpliceFinder: ab initio prediction of splice sites using convolutional neural network. BMC Bioinformatics 2019;20:1–13. 10.1186/s12859-019-3306-3. | ||
1008 | 22 | 91 | Wang Y, Xumeng Gong, Li S, Yang B, Sun Y, Chuan Shi, Wang Y, Yang C, Li H, and Song L. xtrimoabfold: de novo antibody structure prediction without msa. arXiv preprint arXiv:2212.00735, 2022. | ||
1041 | 22 | 124 | Wang Z, Dai Z, Póczos B. et al. Characterizing and avoiding negative transfer. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2019; 11293–11302. | ||
471 | 9 | 49 | Wang, A. et al. Superglue: a stickier benchmark for general-purpose language understanding systems. in 33rd Conference on Neural Information Processing Systems https://papers.nips.cc/paper_files/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf (NeurIPS, 2019). | ||
531 | 10 | 49 | Wang, A. et al. Superglue: a stickier benchmark for general-purpose language understanding systems. in 33rd Conference on Neural Information Processing Systems https://papers.nips.cc/paper_files/paper/2019/file/4496bf24afe7fab6f046bf4923da8de6-Paper.pdf (NeurIPS, 2019). | ||
710 | 14 | 9 | Wang, G. et al. Understanding Transcription Factor Regulation by Integrating Gene Expression and DNase I Hypersensitive Sites. Biomed Res Int 2015, 757530 (2015). | ||
390 | 7 | 78 | Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine-mapping. J. R. Stat. Soc. B 82, 1273–1300 (2020). | ||
351 | 7 | 39 | Wang, Q. S. et al. Leveraging supervised learning for functionally informed fine-mapping of cis-eQTLs identifies an additional 20,913 putative causal eQTLs. Nat. Commun. 12, 3394 (2021). | ||
40 | 3 | 32 | Wang, Y., Wang, H., Wei, L., Li, S., Liu, L., and Wang, X. (2020). Synthetic promoter design in Escherichia coli based on a deep generative network. Nucleic Acids Research 48,6403–6412. | ||
2149 | 35 | 422 | Wanjun Zhong, Ruixiang Cui, Yiduo Guo, Yaobo Liang, Shuai Lu, Yanlin Wang, Amin Saied, Weizhu Chen, and Nan Duan. 2023. AGIEval: A Human-Centric Benchmark for Evaluating Foundation Models. arXiv:2304.06364 [cs.CL] | ||
937 | 22 | 20 | Wasserman WW, Sandelin A. Applied bioinformatics for the identification of regulatory elements. Nat Rev Genet 2004;5:276–87. 10.1038/nrg1315. | ||
595 | 12 | 12 | Watson, J. L. et al. De novo design of protein structure and function with RFdiffusion. Nature 620, 1089–1100 (2023). | ||
2143 | 35 | 416 | Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. 2023. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023). | ||
1527 | 32 | 34 | Weber, A., Born, J. & Rodriguez Martínez, M. TITAN: T-cell receptor specificity prediction with bimodal attention networks. Bioinformatics 37, i237–i244 (2021). | ||
870 | 19 | 9 | Wei C-H, Allot A, Leaman R, Lu Z. PubTator central: automated concept annotation for biomedical full text articles. Nucleic Acids Res. 2019. https://doi.org/10.1093/nar/gkz389. (PMID 31114887.) | ||
845 | 17 | 31 | Wei J, Tay Y, Bommasani R et al. Emergent abilities of large language models. arXiv, arXiv:2206.07682, 2022a, preprint: not peer reviewed. | ||
846 | 17 | 32 | Wei J, Wang X, Schuurmans D et al. Chain of thought prompting elicits reasoning in large language models. arXiv, arXiv:2201.11903, 2022b, preprint: not peer reviewed. | ||
1560 | 33 | 16 | Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael I. Jordan, Joseph E. Gonzalez, and Ion Stoica. Chatbot arena: An open platform for evaluating LLMs by human preference. CoRR, abs/2403.04132, 2024. | ||
1782 | 35 | 55 | Wei-Lin Chiang, Zhuohan Li, Zi Lin, Ying Sheng, Zhanghao Wu, Hao Zhang, Lianmin Zheng, Siyuan Zhuang, Yonghao Zhuang, Joseph E. Gonzalez, Ion Stoica, and Eric P. Xing. 2023. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality. https://vicuna.lmsys.org | ||
1531 | 32 | 38 | Wei, B. & Gong, X. DeepPLA: a novel deep learning-based model for protein-ligand binding affinity prediction (2021). | ||
1863 | 35 | 136 | Weihua Hu, Matthias Fey, Hongyu Ren, Maho Nakata, Yuxiao Dong, and Jure Leskovec. 2023. Ogb-lsc: A large-scale challenge for machine learning on graphs. arXiv preprint arXiv:2103.09430 (2023). | ||
1864 | 35 | 137 | Weihua Hu, Matthias Fey, Marinka Zitnik, Yuxiao Dong, Hongyu Ren, Bowen Liu, Michele Catasta, and Jure Leskovec. 2020. Open graph benchmark: Datasets for machine learning on graphs. Advances in neural information processing systems 3(2020), 22118–22133. | ||
182 | 4 | 42 | Weiner, D. J. et al. Polygenic architecture of rare coding variation across 394,783 exomes. Nature 614, 492–499 (2023). | ||
183 | 4 | 43 | Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020). | ||
1734 | 35 | 7 | Weizhi An, Yuzhi Guo, Yatao Bian, Hehuan Ma, Jinyu Yang, Chunyuan Li, and Junzhou Huang. 2022. MoDNA: motif-oriented pre-training for DNA language model. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. 1–5. | ||
1056 | 23 | 2 | Weizhi An, Yuzhi Guo, Yatao Bian, Hehuan Ma, Jinyu Yang, Chunyuan Li, and Junzhou Huang. MoDNA: motif-oriented pre-training for DNA language model. In Proceedings of the 13th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics, pp. 1–5, Northbrook Illinois, August 2022. ACM. ISBN 978-1-4503-9386-7. doi: 10.1145/3535508.3545512. URL https://dl.acm.org/doi/10.1145/3535508.3545512. | ||
343 | 7 | 31 | Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–D1006 (2014). | ||
1327 | 26 | 90 | Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, and Hao Ma. Effective long-context scaling of foundation models. CoRR, abs/2309.16039, 2023. | ||
1615 | 33 | 71 | Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, and Hao Ma. Effective long-context scaling of foundation models. CoRR, abs/2309.16039, 2023. | ||
1713 | 34 | 88 | Wenhan Xiong, Jingyu Liu, Igor Molybog, Hejia Zhang, Prajjwal Bhargava, Rui Hou, Louis Martin, Rashi Rungta, Karthik Abinav Sankararaman, Barlas Oguz, Madian Khabsa, Han Fang, Yashar Mehdad, Sharan Narang, Kshitiz Malik, Angela Fan, Shruti Bhosale, Sergey Edunov, Mike Lewis, Sinong Wang, and Hao Ma. Effective long-context scaling of foundation models. CoRR, abs/2309.16039, 2023. | ||
1253 | 26 | 16 | Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, and Tony Xia. TheoremQA: A theorem-driven question answering dataset. In EMNLP, pp. 7889–7901. Association for Computational Linguistics, 2023a. | ||
1558 | 33 | 14 | Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, and Tony Xia. TheoremQA: A theorem-driven question answering dataset. In EMNLP, pp. 7889–7901. Association for Computational Linguistics, 2023a. | ||
1641 | 34 | 16 | Wenhu Chen, Ming Yin, Max Ku, Pan Lu, Yixin Wan, Xueguang Ma, Jianyu Xu, Xinyi Wang, and Tony Xia. TheoremQA: A theorem-driven question answering dataset. In EMNLP, pp. 7889–7901. Association for Computational Linguistics, 2023a. | ||
2065 | 35 | 338 | Wenlu Wang, Ye Wang, Honggang Zhao, and Simone Sciabola. 2022. A pre-trained conditional transformer for Target-specific De Novo Molecular Generation. (2022). | ||
1861 | 35 | 134 | Wenpin Hou and Zhicheng Ji. 2024. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. Nature Methods (2024), 1–4. | ||
54 | 3 | 46 | West-Roberts, J., Kravitz, J., Jha, N., Cornman, A., and Hwang, Y. (2024). Diverse genomic embedding benchmark for functional evaluation across the tree of life. bioRxiv ( 2024–07). https://www.biorxiv.org/content/10.1101/2024.07.10.602933v1. | ||
924 | 22 | 7 | Whalen S, Truty RM, Pollard KS. Enhancer–promoter interactions are encoded by complex genomic signatures on looping chromatin. Nat Genet 2016;48:488–96. 10.1038/ng.3539. | ||
912 | 21 | 34 | Wiese G. et al. (2017) Neural domain adaptation for biomedical question answering. In: Proceedings of the 21st Conference on Computational Natural Language Learning (CoNLL 2017), Vancouver, Canada. pp. 281–289. Association for Computational Linguistics. https://www.aclweb.org/anthology/K17-1029. | ||
928 | 22 | 11 | Wiggins WF, Tejani AS. On the opportunities and risks of foundation models for natural language processing in radiology. Radiol Artif Intell 2022;4:e220119. 10.1148/ryai.220119. | ||
1480 | 31 | 31 | Wikimedia Foundation. Wikimedia downloads, https://dumps.wikimedia.org, 2021. | ||
1425 | 30 | 8 | Wilkinson M.D., Dumontier M., Aalbersberg I.J., Appleton G., Axton M., Baak A., Blomberg N., Boiten J.-W., da Silva Santos L.B., Bourne P.E.et al. . The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016; 3:160018. | ||
809 | 16 | 10 | Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR guiding principles for scientific data management and stewardship. Sci Data. 2016;3:160018. | ||
367 | 7 | 55 | Will, A. J. et al. Composition and dosage of a multipartite enhancer cluster control developmental expression of Ihh (Indian hedgehog). Nat. Genet. 49, 1539–1545 (2017). | ||
750 | 15 | 14 | William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005. URL https://aclanthology.org/I05-5002. | ||
1482 | 31 | 33 | William B. Dolan and Chris Brockett. Automatically constructing a corpus of sentential paraphrases. In Proceedings of the Third International Workshop on Paraphrasing (IWP2005), 2005. URL https://www.aclweb.org/anthology/I05-5002. | ||
1264 | 26 | 27 | William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res., 23:120:1–120:39, 2022. | ||
1651 | 34 | 26 | William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. J. Mach. Learn. Res., 23:120:1–120:39, 2022. | ||
1092 | 23 | 38 | William McLaren, Laurent Gil, Sarah E. Hunt, Harpreet Singh Riat, Graham R. S. Ritchie, Anja Thormann, Paul Flicek, and Fiona Cunningham. The Ensembl Variant Effect Predictor. Genome Biology, 17(1):122, June 2016. ISSN 1474-760X. doi: 10.1186/s13059-016-0974-4. URL https://doi.org/10.1186/s13059-016-0974-4. | ||
637 | 12 | 54 | Wolf, T. et al. HuggingFace’s transformers: State-of-the-art natural language processing. https://doi.org/10.48550/ARXIV.1910.03771 (2019). | ||
847 | 17 | 33 | Wong C, Zheng S, Gu Y et al. Scaling clinical trial matching using large language models: a case study in oncology. arXiv, arXiv:2308.02180, 2023, preprint: not peer reviewed. | ||
1525 | 32 | 32 | Wong, L., You, Z.-H., Guo, Z.-H., Yi, H.-C., Chen, Z.-H. & Cao, M.-Y. MIPDH: a novel computational model for predicting microRNA–mRNA interactions by DeepWalk on a heterogeneous network. ACS omega 5, 17022–17032 (2020). | ||
1053 | 22 | 136 | Wornow M, Xu Y, Thapa R. et al. The shaky foundations of large language models and foundation models for electronic health records. npj Digit Med 2023;6:135. 10.1038/s41746-023-00879-8. | ||
360 | 7 | 48 | Wright, C. F. et al. Genetic diagnosis of developmental disorders in the DDD study: a scalable analysis of genome-wide research data. Lancet 385, 1305–1314 (2015). | ||
913 | 21 | 35 | Wu Y. et al. (2016) Google’s neural machine translation system: Bridging the gap between human and machine translation. arXiv preprint arXiv: 1609.08144. | ||
871 | 19 | 10 | Wu C, Macleod I, Su AI. BioGPS and MyGene.info: organizing online, gene-centric information. Nucleic Acids Res. 2013. https://doi.org/10.1093/nar/gks1114. (PMID 23175613.) | ||
998 | 22 | 81 | Wu J, Fu R, Fang H. et al. Medical sam adapter: adapting segment anything model for medical image segmentation. arXiv preprint arXiv:2304.12620, 2023. | ||
1005 | 22 | 88 | Wu R, Ding F, Wang R. et al. High-resolution de novo structure prediction from primary sequence. bioRxiv 2022; 2022-07. | ||
946 | 22 | 29 | Wu Y, Wang S, Yang H. et al. An early evaluation of gpt-4v(ision).arXiv preprintarXiv:2310.16534, 2023. | ||
692 | 13 | 47 | Wu, C., Zhang, X., Zhang, Y., Wang, Y., and Xie, W. Towards generalist foundation model for radiology by leveraging web-scale 2d and 3d medical data, 2023. | ||
2087 | 35 | 360 | wwPDB consortium. 2018. Protein Data Bank: the single global archive for 3D macromolecular structure data. Nucleic Acids Research 47, D1 (10 2018), D520–D528. | ||
549 | 11 | 7 | X. Feng, Z. Wan, M. Wen, S. M. McAleer, Y. Wen, W. Zhang, and J. Wang. Alphazero-like tree-search can guide large language model decoding and training, 2024. URL https://arxiv.org/abs/2309.17179. | ||
1231 | 25 | 55 | X. Gu and M. Krenn, “Generation and human-expert evaluation of interesting research ideas using knowledge graphs and large language models,” 2024. | ||
1378 | 28 | 30 | X. Nguyen, W. Zhang, X. Li, M. M. Aljunied, Q. Tan, L. Cheng, G. Chen, Y. Deng, S. Yang, C. Liu, H. Zhang, and L. Bing. Seallms - large language models for southeast asia. CoRR, abs/2312.00738, 2023. doi: 10.48550/ARXIV.2312.00738. URL https://doi.org/10.48550/arXiv.2312.00738. | ||
578 | 11 | 36 | X. Wang, J. Wei, D. Schuurmans, Q. Le, E. Chi, S. Narang, A. Chowdhery, and D. Zhou. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022. | ||
1406 | 28 | 58 | X. Yue, X. Qu, G. Zhang, Y. Fu, W. Huang, H. Sun, Y. Su, and W. Chen. Mammoth: Building math generalist models through hybrid instruction tuning. CoRR, abs/2309.05653, 2023. doi: 10.48550/ARXIV.2309.05653. URL https://doi.org/10.48550/arXiv.2309.05653. | ||
1290 | 26 | 53 | Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona T. Diab, Veselin Stoyanov, and Xian Li. Few-shot learning with multilingual generative language models. In EMNLP, pp. 9019–9052. Association for Computational Linguistics, 2022b. | ||
1585 | 33 | 41 | Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona T. Diab, Veselin Stoyanov, and Xian Li. Few-shot learning with multilingual generative language models. In EMNLP, pp. 9019–9052. Association for Computational Linguistics, 2022b. | ||
1677 | 34 | 52 | Xi Victoria Lin, Todor Mihaylov, Mikel Artetxe, Tianlu Wang, Shuohui Chen, Daniel Simig, Myle Ott, Naman Goyal, Shruti Bhosale, Jingfei Du, Ramakanth Pasunuru, Sam Shleifer, Punit Singh Koura, Vishrav Chaudhary, Brian O’Horo, Jeff Wang, Luke Zettlemoyer, Zornitsa Kozareva, Mona T. Diab, Veselin Stoyanov, and Xian Li. Few-shot learning with multilingual generative language models. In EMNLP, pp. 9019–9052. Association for Computational Linguistics, 2022b. | ||
2067 | 35 | 340 | Xi Wang, Ruichu Gu, Zhiyuan Chen, Yongge Li, Xiaohong Ji, Guolin Ke, and Han Wen. 2023. UNI-RNA: universal pre-trained models revolutionize RNA research. bioRxiv (2023), 2023–07. | ||
2110 | 35 | 383 | Xi Yang, Aokun Chen, Nima PourNejatian, Hoo Chang Shin, Kaleb E Smith, Christopher Parisien, Colin Compas, Cheryl Martin, Mona G Flores, Ying Zhang, Tanja Magoc, Christopher A Harle, Gloria Lipori, Duane A Mitchell, William R Hogan, Elizabeth A Shenkman, Jiang Bian, and Yonghui Wu. 2022. GatorTron: A Large Clinical Language Model to Unlock Patient Information from Unstructured Electronic Health Records. arXiv:2203.03540 [cs.CL] | ||
693 | 13 | 48 | Xi, Z., Chen, W., Guo, X., He, W., Ding, Y., Hong, B., Zhang, M., Wang, J., Jin, S., Zhou, E., et al. The rise and potential of large language model based agents: A survey. Science China Information Sciences, 68(2):121101, 2025. | ||
1535 | 32 | 42 | Xia, F., Shukla, M., Brettin, T., Garcia-Cardona, C., Cohn, J., Allen, J. E., Maslov, S., Holbeck, S. L., Doroshow, J. H., Evrard, Y. A., et al. Predicting tumor cell line response to drug pairs with deep learning. BMC bioinformatics 19, 71–79 (2018). | ||
765 | 15 | 29 | Xiang Lisa Li and Percy Liang. Prefix-Tuning: Optimizing Continuous Prompts for Generation. arXiv:2101.00190 [cs], January 2021. URL http://arxiv.org/abs/2101.00190. | ||
2156 | 35 | 429 | Xiang Zhuang, Qiang Zhang, Bin Wu, Keyan Ding, Yin Fang, and Huajun Chen. 2023. Graph Sampling-based Meta-Learning for Molecular Property Prediction. | ||
2155 | 35 | 428 | Xiang Zhuang, Qiang Zhang, Keyan Ding, Yatao Bian, Xiao Wang, Jingsong Lv, Hongyang Chen, and Huajun Chen. 2023. Learning Invariant Molecular Representation in Latent Discrete Space. | ||
1929 | 35 | 202 | Xianggen Liu, Yan Guo, Haoran Li, Jin Liu, Shudong Huang, Bowen Ke, and Jiancheng Lv. 2024. DrugLLM: Open Large Language Model for Few-shot Molecule Generation. arXiv preprint arXiv:2405.06690 (2024). | ||
1587 | 33 | 43 | Xiao Liu, Xuanyu Lei, Shengyuan Wang, Yue Huang, Zhuoer Feng, Bosi Wen, Jiale Cheng, Pei Ke, Yifan Xu, Weng Lam Tam, Xiaohan Zhang, Lichao Sun, Hongning Wang, Jing Zhang, Minlie Huang, Yuxiao Dong, and Jie Tang. AlignBench: Benchmarking Chinese alignment of large language models. CoRR, abs/2311.18743, 2023b. | ||
770 | 15 | 34 | Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, and Jie Tang. GPT Understands, Too. arXiv:2103.10385 [cs], March 2021. URL http://arxiv.org/abs/2103.10385. arXiv: 2103.10385. | ||
2134 | 35 | 407 | Xiao-Chen Zhang, Cheng-Kun Wu, Jia-Cai Yi, Xiang-Xiang Zeng, Can-Qun Yang, Ai-Ping Lu, Ting-Jun Hou, and Dong-Sheng Cao. 2022. Pushing the Boundaries of Molecular Property Prediction for Drug Discovery with Multitask Learning BERT Enhanced by SMILES Enumeration. Research 2022 (2022), 0004. | ||
2133 | 35 | 406 | Xiao-Chen Zhang, Cheng-Kun Wu, Zhi-Jiang Yang, Zhen-Xing Wu, Jia-Cai Yi, Chang-Yu Hsieh, Ting-Jun Hou, and Dong-Sheng Cao. 2021. MG-BERT: leveraging unsupervised atomic representation learning for molecular property prediction. Briefings in bioinformatics 22, 6 (2021), bbab152. | ||
1821 | 35 | 94 | Xiaomin Fang, Fan Wang, Lihang Liu, Jingzhou He, Dayong Lin, Yingfei Xiang, Kunrui Zhu, Xiaonan Zhang, Hua Wu, Hui Li, et al. 2023. A method for multiple-sequence-alignment-free protein structure prediction using a protein language model. Nature Machine Intelligence (2023), 1–10. | ||
2132 | 35 | 405 | Xiaotian Zhang, Chunyang Li, Yi Zong, Zhengyu Ying, Liang He, and Xipeng Qiu. 2023. Evaluating the Performance of Large Language Models on GAOKAO Benchmark. | ||
2068 | 35 | 341 | Xiaoxuan Wang, Ziniu Hu, Pan Lu, Yanqiao Zhu, Jieyu Zhang, Satyen Subramaniam, Arjun R. Loomba, Shichang Zhang, Yizhou Sun, and Wei Wang. 2023. SciBench: Evaluating College-Level Scientific Problem-Solving Abilities of Large Language Models. arXiv:2307.10635 [cs.CL] | ||
2066 | 35 | 339 | Xin Wang, Xin Gao, Guohua Wang, and Dan Li. 2023. miProBERT: identification of microRNA promoters based on the pre-trained model BERT. Briefings in bioinformatics 24, (2023), bbad093. | ||
1941 | 35 | 214 | Xingyu Lu, He Cao, Zijing Liu, Shengyuan Bai, Leqing Chen, Yuan Yao, Hai-Tao Zheng, and Yu Li. 2024. MoleculeQA: A Dataset to Evaluate Factual Accuracy in Molecular Comprehension. arXiv preprint arXiv:2403.08192 (2024). | ||
914 | 21 | 36 | Xu K. et al. (2019) Document-level attention-based BiLSTM-CRF incorporating disease dictionary for disease named entity recognition. Comput. Biol. Med., 108, 122–132. | ||
952 | 22 | 35 | Xu M, Yuan X, Miret S. et al. Protst: multi-modality learning of protein sequences and biomedicaltexts. arXiv preprint arXiv:2301.12040, 2023. | ||
373 | 7 | 61 | Xu, H. et al. Elevated ASCL2 expression in breast cancer is associated with the poor prognosis of patients. Am. J. Cancer Res. 7, 955–961 (2017). | ||
719 | 14 | 18 | Xu, H., Jia, P. & Zhao, Z. Deep4mC: systematic assessment and computational prediction for DNA N4-methylcytosine sites by deep learning. Briefings in Bioinformatics 22, bbaa099 (2021). | ||
1468 | 31 | 19 | Xuanqing Liu, Hsiang-Fu Yu, Inderjit S. Dhillon, and Cho-Jui Hsieh. Learning to encode position for transformer with continuous dynamical model. In Proceedings of the 37th International Conference on Machine Learning, ICML 2020, 13-18 July 2020, Virtual Event, volume 119 of Proceedings of Machine Learning Research, pages 6327–6335. PMLR, 2020. URL http://proceedings.mlr.press/v119/liu20n.html. | ||
1852 | 35 | 125 | Xuehai He, Shu Chen, Zeqian Ju, Xiangyu Dong, Hongchao Fang, Sicheng Wang, Yue Yang, Jiaqi Zeng, Ruisi Zhang, Ruoyu Zhang, Meng Zhou, Penghui Zhu, and Pengtao Xie. 2020. MedDialog: Two Large-scale Medical Dialogue Datasets. arXiv:2004.03329 [cs.LG] | ||
1203 | 25 | 27 | Y. Bengio, G. Hinton, A. Yao, D. Song, P. Abbeel, T. Darrell, Y. N. Harari, Y.-Q. Zhang, L. Xue, S. Shalev-Shwartz, G. Hadfield, J. Clune, T. Maharaj, F. Hutter, A. G. Baydin, S. McIlraith, Q. Gao, A. Acharya, D. Krueger, A. Dragan, P. Torr, S. Russell, D. Kahneman, J. Brauner, and S. Mindermann, “Managing extreme ai risks amid rapid progress,” Science, vol. 384, no. 6698, pp. 842–845, 2024. | ||
548 | 11 | 6 | Y. Dubois, B. Galambosi, P. Liang, and T. B. Hashimoto. Length-controlled alpacaeval: A simple way to debias automatic evaluators. arXiv preprint arXiv:2404.04475, 2024. | ||
553 | 11 | 11 | Y. He, S. Li, J. Liu, Y. Tan, W. Wang, H. Huang, X. Bu, H. Guo, C. Hu, B. Zheng, et al. Chinese simpleqa: A chinese factuality evaluation for large language models. arXiv preprint arXiv:2411.07140, 2024. | ||
555 | 11 | 13 | Y. Huang, Y. Bai, Z. Zhu, J. Zhang, J. Zhang, T. Su, J. Liu, C. Lv, Y. Zhang, J. Lei, et al. C-Eval: A multi-level multi-discipline chinese evaluation suite for foundation models. arXiv preprint arXiv:2305.08322, 2023. | ||
230 | 5 | 43 | Y. Ji, Z. Zhou, H. Liu, and R. V. Davuluri. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 37(15):2112–2120, 022021. ISSN1367-4803. doi: 10.1093/bioinformatics/btab083. URL https://doi.org/10.1093/bioinformatics/btab083. | ||
1142 | 24 | 25 | Y. Ji, Z. Zhou, H. Liu, and R. V. Davuluri. DNABERT: pre-trained bidirectional encoder representations from transformers model for DNA-language in genome. Bioinformatics, 37(15):2112–2120, 2021. | ||
1371 | 28 | 23 | Y. Leviathan, M. Kalman, and Y. Matias. Fast inference from transformers via speculative decoding. 2023. In International Conference on Machine Learning, pages 19274–19286. PMLR, | ||
1165 | 24 | 48 | Y. Tay, M. Dehghani, S. Abnar, Y. Shen, D. Bahri, P. Pham, J. Rao, L. Yang, S. Ruder, and D. Metzler. Long range arena: A benchmark for efficient transformers. arXiv preprint arXiv:2011.04006, 2020. | ||
1166 | 24 | 49 | Y. Tay, V. Q. Tran, S. Ruder, J. Gupta, H. W. Chung, D. Bahri, Z. Qin, S. Baumgartner, C. Yu, and D. Metzler. Charformer: Fast character transformers via gradient-based subword tokenization. arXiv preprint arXiv:2106.12672, 2021. | ||
579 | 11 | 37 | Y. Wang, X. Ma, G. Zhang, Y. Ni, A. Chandra, S. Guo, W. Ren, A. Arulraj, X. He, Z. Jiang, T. Li, M. Ku, K. Wang, A. Zhuang, R. Fan, X. Yue, and W. Chen. Mmlu-pro: A more robust and challenging multi-task language understanding benchmark. CoRR, abs/2406.01574, 2024. URL https://doi.org/10.48550/arXiv.2406.01574. | ||
2011 | 35 | 284 | Yair Schiff, Chia-Hsiang Kao, Aaron Gokaslan, Tri Dao, Albert Gu, and Volodymyr Kuleshov. 2024. Caduceus: Bi-directional equivariant long-range dna sequence modeling. arXiv preprint arXiv:2403.03234 (2024). | ||
2016 | 35 | 289 | Yaiza Serrano, Sergi Roda, Victor Guallar, and Alexis Molina. 2023. Efficient and accurate sequence generation with small-scale protein language models. bioRxiv (2023), 2023–08. | ||
1844 | 35 | 117 | Yan Guo, Yulin Dai, Hui Yu, Shilin Zhao, David C Samuels, and Yu Shyr. 2017. Improvements and impacts of GRCh38 human reference on high throughput sequencing data analysis. Genomics 109, 2 (2017), 83–90. | ||
694 | 13 | 49 | Yan, Z., Zhang, K., Zhou, R., He, L., Li, X., and Sun, L. Multimodal chatgpt for medical applications: an experimental study of gpt-4v. arXiv preprint arXiv:2310.19061, 2023. | ||
1027 | 22 | 110 | Yang F, Wang W, Wang F. et al. scBERT as a large-scale pre-trained deep language model for cell type annotation of single-cell RNA-seq data. Nat Mach Intell 2022;4:852–66. 10.1038/s42256-022-00534-z. | ||
1032 | 22 | 115 | Yang X, Mann KK, Wu H. et al. scCross: a deep generative model for unifying single-cell multi-omics with seamless integration, cross-modal generation, and in silico exploration. Genome Biol 2024;25:198. 10.1186/s13059-024-03338-z. | ||
2137 | 35 | 410 | Yang Zhang and Jeffrey Skolnick. 2007. Scoring function for automated assessment of protein structure template quality. PROTEINS-NEW YORK- 68, 4 (2007), 1020. | ||
695 | 13 | 50 | Yang, H. M., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., et al. Chexnet: radiologist-level pneumonia detection on chest x-rays with deep learning. arXiv preprint arXiv:1711.05225, 2017. | ||
98 | 3 | 90 | Yang, K. K., Fusi, N., and Lu, A. X. (2024). Convolutions are competitive with transformers for protein sequence pretraining. Cell Systems 15, 286–294. | ||
1539 | 32 | 46 | Yang, K., Swanson, K., Jin, W., Coley, C., Eiden, P., Gao, H., Guzman-Perez, A., Hopper, T., Kelley, B., Mathea, M., et al. Analyzing learned molecular representations for property prediction. Journal of chemical information and modeling 59, 3370–3388 (2019). | ||
68 | 3 | 60 | Yang, M., Huang, L., Huang, H., Tang, H., Zhang, N., Yang, H., Wu, J., and Mu, F. (2022). Integrating convolution and self-attention improves language model of human genome for interpreting non-coding regions at base-resolution. Nucleic Acids Research 50, e81–e81. | ||
602 | 12 | 19 | Yang, Q. et al. eRF1 mediates codon usage effects on mRNA translation efficiency through premature termination at rare codons. Nucleic Acids Res. 47, 9243–9258 (2019). | ||
726 | 14 | 25 | Yang, X., Huang, J. Y., Zhou, W. & Chen, M. Parameter-Efficient Tuning with Special Token Adaptation. Preprint at https://doi.org/10.48550/arXiv.2210.04382 (2023). | ||
1776 | 35 | 49 | Yangyang Chen, Zixu Wang, Lei Wang, Jianmin Wang, Pengyong Li, Dongsheng Cao, Xiangxiang Zeng, Xiucai Ye, and Tetsuya Sakurai. 2023. Deep generative model for drug design from protein target sequence. Journal of Cheminformatics 15, 1 (2023), 38. | ||
1899 | 35 | 172 | Yanis Labrak, Adrien Bazoge, Emmanuel Morin, Pierre-Antoine Gourraud, Mickael Rouvier, and Richard Dufour. 2024. Biomistral: A collection of open-source pretrained large language models for medical domains. arXiv preprint arXiv:2402.10373 (2024). | ||
2069 | 35 | 342 | Yanli Wang, Jewen Xiao, Tugba O Suzek, Jian Zhang, Jiyao Wang, and Stephen H Bryant. 2009. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic acids research 37, suppl_2 (2009), W623–W633. | ||
2070 | 35 | 343 | Yanli Wang, Jewen Xiao, Tugba O Suzek, Jian Zhang, Jiyao Wang, Zhigang Zhou, Lianyi Han, Karen Karapetyan, Svetlana Dracheva, Benjamin A Shoemaker, et al. 2012. PubChem’s BioAssay database. Nucleic acids research 40, D1 (2012), D400–D412. | ||
1258 | 26 | 21 | Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In ICML, volume 70 of Proceedings of Machine Learning Research, pp. 933–941. PMLR, 2017. | ||
1565 | 33 | 21 | Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In ICML, volume 70 of Proceedings of Machine Learning Research, pp. 933–941. PMLR, 2017. | ||
1646 | 34 | 21 | Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In ICML, volume 70 of Proceedings of Machine Learning Research, pp. 933–941. PMLR, 2017. | ||
2170 | 36 | 10 | Yann N. Dauphin, Angela Fan, Michael Auli, and David Grangier. Language modeling with gated convolutional networks. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, pp. 933–941. JMLR.org, 2017. | ||
2154 | 35 | 427 | Yanqiao Zhu, Jeehyun Hwang, Keir Adams, Zhen Liu, Bozhao Nan, Brock Stenfors, Yuanqi Du, Jatin Chauhan, Olaf Wiest, Olexandr Isayev, et al. 2023. Learning Over Molecular Conformer Ensembles: Datasets and Benchmarks. arXiv preprint arXiv:2310.00115 (2023). | ||
1873 | 35 | 146 | Yanrong Ji, Zhihan Zhou, Han Liu, and Ramana V Davuluri. 2021. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics 37, 15 (2021), 2112–2120. | ||
2174 | 36 | 14 | Yanrong Ji, Zhihan Zhou, Han Liu, and Ramana V Davuluri. Dnabert: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics, 37 (15):2112–2120, 2021. Published as a conference paper at ICLR 2024 | ||
1080 | 23 | 26 | Yanrong Ji, Zhihan Zhou, Han Liu, and Ramana V Davuluri. DNABERT: pre-trained Bidirectional Encoder Representations from Transformers model for DNA-language in genome. Bioinformatics, 37(15):2112–2120, August 2021. ISSN 1367-4803. doi: 10.1093/bioinformatics/btab083. URL https://doi.org/10.1093/bioinformatics/btab083. | ||
848 | 17 | 34 | Yao S, Zhao J, Yu D et al. React: synergizing reasoning and acting in language models. arXiv, arXiv:2210.03629, 2022, preprint: not peer reviewed. | ||
696 | 13 | 51 | Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., and Cao, Y. React: Synergizing reasoning and acting in language models, 2023. | ||
1816 | 35 | 89 | Yasha Ektefaie, Andrew Shen, Daria Bykova, Maximillian Marin, Marinka Zitnik, and Maha R Farhat. 2024. Evaluating generalizability of artificial intelligence models for molecular datasets. bioRxiv (2024). | ||
986 | 22 | 69 | Yasunaga M, Bosselut A, Ren H. et al. Deep bidirectional language-knowledge graph pretraining. Adv Neural Inf Process Syst 2022;35:37309–23. | ||
2072 | 35 | 345 | Ye Wang, Honggang Zhao, Simone Sciabola, and Wenlu Wang. 2023. cMolGPT: A Conditional Generative Pre-Trained Transformer for Target- Specific De Novo Molecular Generation. Molecules 28, 11 (2023), 4430. | ||
1336 | 26 | 99 | Yidan Zhang, Boyi Deng, Yu Wan, Baosong Yang, Haoran Wei, Fei Huang, Bowen Yu, Junyang Lin, and Jingren Zhou. P-MMEval: A parallel multilingual multitask benchmark for consistent evaluation of LLMs. CoRR, abs/2411.09116, 2024. | ||
1722 | 34 | 97 | Yidan Zhang, Boyi Deng, Yu Wan, Baosong Yang, Haoran Wei, Fei Huang, Bowen Yu, Junyang Lin, and Jingren Zhou. P-MMEval: A parallel multilingual multitask benchmark for consistent evaluation of LLMs. CoRR, abs/2411.09116, 2024. | ||
1988 | 35 | 261 | Yifan Peng, Qingyu Chen, and Zhiyong Lu. 2020. An Empirical Study of Multi-Task Learning on BERT for Biomedical Text Mining. arXiv:2005.02799 [cs.CL] | ||
2138 | 35 | 411 | Yikun Zhang, Geyan Ye, Chaohao Yuan, Bo Han, Long-Kai Huang, Jianhua Yao, Wei Liu, and Yu Rong. 2024. Atomas: Hierarchical Alignment on Molecule-Text for Unified Molecule Understanding and Generation. arXiv preprint arXiv:2404.16880 (2024). | ||
2136 | 35 | 409 | Yikun Zhang, Mei Lang, Jiuhong Jiang, Zhiqiang Gao, Fan Xu, Thomas Litfin, Ke Chen, Jaswinder Singh, Xiansong Huang, Guoli Song, et al. 2023. Multiple sequence-alignment-based RNA language model and its application to structural inference. bioRxiv (2023), 2023–03. | ||
1823 | 35 | 96 | Yin Fang, Ningyu Zhang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. 2023. Domain-agnostic molecular generation with self-feedback. CoRR, abs/2301.11259 (2023). | ||
1824 | 35 | 97 | Yin Fang, Qiang Zhang, Haihong Yang, Xiang Zhuang, Shumin Deng, Wen Zhang, Ming Qin, Zhuo Chen, Xiaohui Fan, and Huajun Chen. 2022. Molecular contrastive learning with chemical element knowledge graph. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 3968–3976. | ||
1825 | 35 | 98 | Yin Fang, Qiang Zhang, Ningyu Zhang, Zhuo Chen, Xiang Zhuang, Xin Shao, Xiaohui Fan, and Huajun Chen. 2023. Knowledge graph-enhanced molecular contrastive learning with functional prompt. Nature Machine Intelligence (2023), 1–12. | ||
1822 | 35 | 95 | Yin Fang, Xiaozhuan Liang, Ningyu Zhang, Kangwei Liu, Rui Huang, Zhuo Chen, Xiaohui Fan, and Huajun Chen. 2023. Mol-Instructions: A Large-Scale Biomolecular Instruction Dataset for Large Language Models. arXiv:2306.08018 [q-bio.QM] | ||
697 | 13 | 52 | Yin, G., Bai, H., Ma, S., Nan, F., Sun, Y., Xu, Z., Ma, S., Lu, J., Kong, X., Zhang, A., et al. Mmau: A holistic benchmark of agent capabilities across diverse domains. arXiv preprint arXiv:2407.18961, 2024. | ||
1331 | 26 | 94 | Yinfei Yang, Yuan Zhang, Chris Tar, and Jason Baldridge. PAWS-X: A cross-lingual adversarial dataset for paraphrase identification. In EMNLP/IJCNLP (1), pp. 3685–3690. Association for Computational Linguistics, 2019. | ||
1616 | 33 | 72 | Yinfei Yang, Yuan Zhang, Chris Tar, and Jason Baldridge. PAWS-X: A cross-lingual adversarial dataset for paraphrase identification. In EMNLP/IJCNLP (1), pp. 3685–3690. Association for Computational Linguistics, 2019. | ||
1717 | 34 | 92 | Yinfei Yang, Yuan Zhang, Chris Tar, and Jason Baldridge. PAWS-X: A cross-lingual adversarial dataset for paraphrase identification. In EMNLP/IJCNLP (1), pp. 3685–3690. Association for Computational Linguistics, 2019. | ||
2135 | 35 | 408 | Ying Zhang, Fang Ge, Fuyi Li, Xibei Yang, Jiangning Song, and Dong-Jun Yu. 2023. Prediction of multiple types of RNA modifications via biological language model. IEEE/ACM Transactions on Computational Biology and Bioinformatics (2023). | ||
1622 | 33 | 78 | Yingxiu Zhao, Bowen Yu, Binyuan Hui, Haiyang Yu, Minghao Li, Fei Huang, Nevin L. Zhang, and Yongbin Li. Tree-Instruct: A preliminary study of the intrinsic relationship between complexity and alignment. In LREC/COLING, pp. 16776–16789. ELRA and ICCL, 2024. | ||
1931 | 35 | 204 | Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019). | ||
771 | 15 | 35 | Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. Roberta: A robustly optimized bert pretraining approach, 2019. | ||
2020 | 35 | 293 | Yiqing Shen, Zan Chen, Michail Mamalakis, Luhan He, Haiyang Xia, Tianbin Li, Yanzhou Su, Junjun He, and Yu Guang Wang. 2024. A Fine-tuning Dataset and Benchmark for Large Language Models for Protein Understanding. arXiv preprint arXiv:2406.05540 (2024). | ||
1778 | 35 | 51 | Yiqun Chen and James Zou. 2023. GenePT: A Simple But Effective Foundation Model for Genes and Cells Built From ChatGPT. bioRxiv (2023). | ||
1777 | 35 | 50 | Yirong Chen, Zhenyu Wang, Xiaofen Xing, huimin zheng, Zhipei Xu, Kai Fang, Junhong Wang, Sihang Li, Jieling Wu, Qi Liu, and Xiangmin Xu. 2023. BianQue: Balancing the Questioning and Suggestion Ability of Health LLMs with Multi-turn Health Conversations Polished by ChatGPT. arXiv:2310.15896 [cs.CL] | ||
1946 | 35 | 219 | Yizhen Luo, Jiahuan Zhang, Siqi Fan, Kai Yang, Yushuai Wu, Mu Qiao, and Zaiqing Nie. 2023. Biomedgpt: Open multimodal generative pre-trained transformer for biomedicine. arXiv preprint arXiv:2308.09442 (2023). 82 Zhang and Ding, et al. | ||
1945 | 35 | 218 | Yizhen Luo, Kai Yang, Massimo Hong, Xing Yi Liu, and Zaiqing Nie. 2023. MolFM: A Multimodal Molecular Foundation Model. arXiv:2307.09484 [q-bio.BM] | ||
797 | 15 | 61 | Yong Zhao, Jinyu Li, and Yifan Gong. Low-rank plus diagonal adaptation for deep neural networks. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5005–5009. IEEE, 2016. | ||
915 | 21 | 37 | Yoon W. et al. (2019) Collabonet: collaboration of deep neural networks for biomedical named entity recognition. BMC Bioinformatics, 20, 249. | ||
1905 | 35 | 178 | Youhan Lee, Hasun Yu, Jaemyung Lee, and Jaehoon Kim. 2023. Pre-training Sequence, Structure, and Surface Features for Comprehensive Protein Representation Learning. In The Twelfth International Conference on Learning Representations. | ||
1919 | 35 | 192 | Youwei Liang, Ruiyi Zhang, Li Zhang, and Pengtao Xie. 2023. DrugChat: towards enabling ChatGPT-like capabilities on drug molecule graphs. arXiv preprint arXiv:2309.03907 (2023). | ||
1841 | 35 | 114 | Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. 2021. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH) 3, (2021), 1–23. | ||
2184 | 36 | 24 | Yu Ni, Linqi Fan, Miao Wang, Ning Zhang, Yongchun Zuo, and Mingzhi Liao. EPI-Mind: Identifying enhancer-promoter interactions based on transformer mechanism. Interdiscip. Sci., 14(3):786–794, September 2022. | ||
2004 | 35 | 277 | Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. 2020. Self-supervised graph transformer on large-scale molecular data. Advances in Neural Information Processing Systems 33 (2020), 12559–12571. | ||
796 | 15 | 60 | Yu Zhang, Ekapol Chuangsuwanich, and James Glass. Extracting deep neural network bottleneck features using low-rank matrix factorization. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP), pp. 185–189. IEEE, 2014. | ||
102 | 3 | 94 | Yu, L., Simig, D., Flaherty, C., Aghajanyan, A., Zettlemoyer, L., and Lewis, M. MEGABYTE: Predicting million-byte sequences with multiscale transformers. In: Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., and Levine, S., eds. Advances in Neural Information Processing Systems vol. 36. Curran Associates, Inc. (2023):( 78808–78823). https://proceedings.neurips.cc/paper_files/paper/2023/file/f8f78f8043f35890181a824e53a57134-Paper-Conference.pdf. | ||
995 | 22 | 78 | Yuan H, Yuan Z, Gan R. et al. BioBART: Pretraining and evaluation of a biomedical generative language model. arXiv preprint arXiv:2204.03905, 2022. | ||
939 | 22 | 22 | Yuan L, Chen D, Chen YL. et al. Florence: a new foundation model for computervision, arXiv preprint arXiv:2111.11432, 2021. | ||
1810 | 35 | 83 | Yuanqi Du, Tianfan Fu, Jimeng Sun, and Shengchao Liu. 2022. Molgensurvey: A systematic survey in machine learning models for molecule design. arXiv preprint arXiv:2203.14500 (2022). | ||
766 | 15 | 30 | Yuanzhi Li and Yingyu Liang. Learning overparameterized neural networks via stochastic gradient descent on structured data. In Advances in Neural Information Processing Systems, 2018. | ||
768 | 15 | 32 | Yuanzhi Li, Tengyu Ma, and Hongyang Zhang. Algorithmic regularization in over-parameterized matrix sensing and neural networks with quadratic activations. In Conference On Learning Theory, pp. 2–47. PMLR, 2018b. | ||
767 | 15 | 31 | Yuanzhi Li, Yingyu Liang, and Andrej Risteski. Recovery guarantee of weighted low-rank approximation via alternating minimization. In International Conference on Machine Learning, pp.2358–2367. PMLR, 2016. | ||
1323 | 26 | 86 | Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, and Wenhu Chen. MMLU-Pro: A more robust and challenging multi-task language understanding benchmark. CoRR, abs/2406.01574, 2024b. | ||
1614 | 33 | 70 | Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, and Wenhu Chen. MMLU-Pro: A more robust and challenging multi-task language understanding benchmark. CoRR, abs/2406.01574, 2024. | ||
1709 | 34 | 84 | Yubo Wang, Xueguang Ma, Ge Zhang, Yuansheng Ni, Abhranil Chandra, Shiguang Guo, Weiming Ren, Aaran Arulraj, Xuan He, Ziyan Jiang, Tianle Li, Max Ku, Kai Wang, Alex Zhuang, Rongqi Fan, Xiang Yue, and Wenhu Chen. MMLU-Pro: A more robust and challenging multi-task language understanding benchmark. CoRR, abs/2406.01574, 2024b. | ||
2002 | 35 | 275 | Yuchen Ren, Zhiyuan Chen, Lifeng Qiao, Hongtai Jing, Yuchen Cai, Sheng Xu, Peng Ye, Xinzhu Ma, Siqi Sun, Hongliang Yan, et al. 2024. BEACON: Benchmark for Comprehensive RNA Tasks and Language Models. arXiv preprint arXiv:2406.10391 (2024). | ||
1767 | 35 | 40 | Yue Cao, Payel Das, Vijil Chenthamarakshan, Pin-Yu Chen, Igor Melnyk, and Yang Shen. 2021. Fold2seq: A joint sequence (1d)-fold (3d) embedding-based generative model for protein design. In International Conference on Machine Learning. PMLR, 1261–1271. | ||
1916 | 35 | 189 | Yuesen Li, Chengyi Gao, Xin Song, Xiangyu Wang, Yungang Xu, and Suxia Han. 2023. Druggpt: A gpt-based strategy for designing potential ligands targeting specific proteins. bioRxiv (2023), 2023–06. | ||
2071 | 35 | 344 | Yuhao Wang, Qiang Zhang, Ming Qin, Xiang Zhuang, Xiaotong Li, Zhichen Gong, Zeyuan Wang, Yu Zhao, Jianhua Yao, Keyan Ding, et al. [n. d.]. Knowledge-aware Reinforced Language Models for Protein Directed Evolution. In Forty-first International Conference on Machine Learning. | ||
1479 | 31 | 30 | Yukun Zhu, Ryan Kiros, Richard Zemel, Ruslan Salakhutdinov, Raquel Urtasun, Antonio Torralba, and Sanja Fidler. Aligning books and movies: Towards story-like visual explanations by watching movies and reading books. In arXiv preprint arXiv:1506.06724, 2015. | ||
1561 | 33 | 17 | Yunfei Chu, Jin Xu, Xiaohuan Zhou, Qian Yang, Shiliang Zhang, Zhijie Yan, Chang Zhou, and Jingren Zhou. Qwen-Audio: Advancing universal audio understanding via unified large-scale audio-language models. CoRR, abs/2311.07919, 2023. | ||
1867 | 35 | 140 | Yunha Hwang, Andre L Cornman, Elizabeth H Kellogg, Sergey Ovchinnikov, and Peter R Girguis. 2024. Genomic language model predicts protein co-regulation and function. Nature communications 15, 1 (2024), 2880. | ||
1090 | 23 | 36 | Yunhai Luo, Benjamin C. Hitz, Idan Gabdank, Jason A. Hilton, Meenakshi S. Kagda, Bonita Lam, Zachary Myers, Paul Sud, Jennifer Jou, Khine Lin, Ulugbek K. Baymuradov, Keenan Graham, Casey Litton, Stuart R. Miyasato, J. Seth Strattan, Otto Jolanki, Jin-Wook Lee, Forrest Y. Tanaka, Philip Adenekan, Emma O’Neill, and J. Michael Cherry. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Research, 48(D1):D882–D889, January 2020. ISSN 1362-4962. doi: 10.1093/nar/gkz1062. | ||
1553 | 33 | 9 | Yuntao Bai, Saurav Kadavath, Sandipan Kundu, Amanda Askell, Jackson Kernion, Andy Jones, Anna Chen, Anna Goldie, Azalia Mirhoseini, Cameron McKinnon, Carol Chen, Catherine Olsson, Christopher Olah, Danny Hernandez, Dawn Drain, Deep Ganguli, Dustin Li, Eli Tran-Johnson, Ethan Perez, Jamie Kerr, Jared Mueller, Jeffrey Ladish, Joshua Landau, Kamal Ndousse, Kamile Lukosiute, Liane Lovitt, Michael Sellitto, Nelson Elhage, Nicholas Schiefer, Noem´ ı Mercado, Nova DasSarma, Robert Lasenby, Robin Larson, Sam Ringer, Scott Johnston, Shauna Kravec, Sheer El Showk, Stanislav Fort, Tamera Lanham, Timothy Telleen-Lawton, Tom Conerly, Tom Henighan,Tristan Hume, Samuel R. Bowman, Zac Hatfield-Dodds, Ben Mann, Dario Amodei, Nicholas Joseph, Sam McCandlish, Tom Brown, and Jared Kaplan. Constitutional AI: Harmlessness from AI feedback. CoRR, abs/2212.08073, 2022. | ||
1932 | 35 | 205 | Yunwu Liu, Ruisheng Zhang, Tongfeng Li, Jing Jiang, Jun Ma, and Ping Wang. 2023. MolRoPE-BERT: An enhanced molecular representation with Rotary Position Embedding for molecular property prediction. Journal of Molecular Graphics and Modelling 118 (2023), 108344. | ||
1917 | 35 | 190 | Yunxiang Li, Zihan Li, Kai Zhang, Ruilong Dan, Steve Jiang, and You Zhang. 2023. ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge. arXiv:2303.14070 [cs.CL] | ||
1247 | 26 | 10 | Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, and Juanzi Li. LongAlign: A recipe for long context alignment of large language models. In EMNLP (Findings), pp. 1376–1395. Association for Computational Linguistics, 2024. | ||
1635 | 34 | 10 | Yushi Bai, Xin Lv, Jiajie Zhang, Yuze He, Ji Qi, Lei Hou, Jie Tang, Yuxiao Dong, and Juanzi Li. LongAlign: A recipe for long context alignment of large language models. In EMNLP (Findings), pp. 1376–1395. Association for Computational Linguistics, 2024. | ||
1930 | 35 | 203 | Yuyan Liu, Sirui Ding, Sheng Zhou, Wenqi Fan, and Qiaoyu Tan. 2024. MolecularGPT: Open Large Language Model (LLM) for Few-Shot Molecular Property Prediction. arXiv preprint arXiv:2406.12950 (2024). | ||
1866 | 35 | 139 | Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, and Junxian He. 2023. C-Eval: A Multi-Level Multi-Discipline Chinese Evaluation Suite for Foundation Models. In Advances in Neural Information Processing Systems. | ||
1573 | 33 | 29 | Yuzhen Huang, Yuzhuo Bai, Zhihao Zhu, Junlei Zhang, Jinghan Zhang, Tangjun Su, Junteng Liu, Chuancheng Lv, Yikai Zhang, Jiayi Lei, Yao Fu, Maosong Sun, and Junxian He. C-Eval: A multi-level multi-discipline chinese evaluation suite for foundation models. In NeurIPS, 2023. | ||
191 | 5 | 5 | Ž. Avsec, V. Agarwal, D. Visentin, J. R. Ledsam, A. Grabska-Barwinska, K. R. Taylor, Y. Assael, J. Jumper, P. Kohli, and D. R. Kelley. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10):1196–1203, 2021. | ||
1118 | 24 | 1 | Ž. Avsec, V. Agarwal, D. Visentin, J. R. Ledsam, A. Grabska-Barwinska, K. R. Taylor, Y. Assael, J. Jumper, P. Kohli, and D. R. Kelley. Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods, 18(10):1196–1203, 2021. | ||
1351 | 28 | 3 | Z. Azerbayev, H. Schoelkopf, K. Paster, M. D. Santos, S. McAleer, A. Q. Jiang, J. Deng, S. Biderman, and S. Welleck. Llemma: An open language model for mathematics. arXiv preprint arXiv:2310.10631, 2023. | ||
1486 | 31 | 37 | Z. Chen, H. Zhang, and L. Zhang, X.and Zhao. Quora question pairs., 2018b. URL https://www.kaggle.com/c/quora-question-pairs. | ||
1359 | 28 | 11 | Z. Du, Y. Qian, X. Liu, M. Ding, J. Qiu, Z. Yang, and J. Tang. Glm: General language model pretraining with autoregressive blank infilling. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 320–335, 2022. | ||
1361 | 28 | 13 | Z. Gou, Z. Shao, Y. Gong, Y. Shen, Y. Yang, M. Huang, N. Duan, and W. Chen. Tora: A tool-integrated reasoning agent for mathematical problem solving. CoRR, abs/2309.17452, 2023. doi: 10.48550/ARXIV.2309.17452. URL https://doi.org/10.48550/arXiv.2309.1745 2. | ||
1211 | 25 | 35 | Z. Li, “The dark side of chatgpt: Legal and ethical challenges from stochastic parrots and hallucination,” 2023. | ||
1146 | 24 | 29 | Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, et al. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv, 2022. | ||
241 | 5 | 54 | Z. Lin, H. Akin, R. Rao, B. Hie, Z. Zhu, W. Lu, N. Smetanin, R. Verkuil, O. Kabeli, Y. Shmueli, A. dos Santos Costa, M. Fazel-Zarandi, T. Sercu, S. Candido, and A. Rives. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379:1123–1130, 3 2023. ISSN 0036-8075. doi:10.1126/science.ade2574. | ||
571 | 11 | 29 | Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, M. Zhang, Y. Li, Y. Wu, and D. Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. arXiv preprint arXiv:2402.03300, 2024. | ||
1348 | 28 | 0 | Z. Shao, P. Wang, Q. Zhu, R. Xu, J. Song, X. Bi, H. Zhang, M. Zhang, Y. K. Li, Y. Wu, and D. Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv.org/abs/2402.03300. | ||
1417 | 30 | 0 | Z. Sondka, N. B. Dhir, D. Carvalho-Silva, S. Jupe, Madhumita, K. McLaren, M. Starkey, S. Ward, J. Wilding, M. Ahmed, J. Argasinska, D. Beare, M. S. Chawla, S. Duke, I. Fasanella, A. G. Neogi, S. Haller, B. Hetenyi, L. Hodges, A. Holmes, R. Lyne, T. Maurel, S. Nair, H. Pedro, A. Sangrador-Vegas, H. Schuilenburg, Z. Sheard, S. Y. Yong, and J. Teague. Cosmic: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Research, 52:D1210–D1217, 1 2024. ISSN 0305-1048. doi: 10.1093/NAR/GKAD986. URL https://dx.doi.org/10.1093/nar/gkad986. | ||
1396 | 28 | 48 | Z. Wang, R. Xia, and P. Liu. Generative AI for math: Part I - mathpile: A billion-token-scale pretraining corpus for math. CoRR, abs/2312.17120, 2023c. doi: 10.48550/ARXIV.2312.17120. URL https://doi.org/10.48550/arXiv.2312.17120. | ||
1463 | 31 | 14 | Z. Yang, Zihang Dai, Yiming Yang, J. Carbonell, R. Salakhutdinov, and Quoc V. Le. Xlnet: Generalized autoregressive pretraining for language understanding. In NeurIPS, 2019. | ||
1404 | 28 | 56 | Z. Yuan, H. Yuan, C. Li, G. Dong, C. Tan, and C. Zhou. Scaling relationship on learning mathematical reasoning with large language models. arXiv preprint arXiv:2308.01825, 2023a. | ||
1405 | 28 | 57 | Z. Yuan, H. Yuan, C. Tan, W. Wang, S. Huang, and F. Huang. Rrhf: Rank responses to align language models with human feedback without tears. arXiv preprint arXiv:2304.05302, 2023b. | ||
309 | 5 | 124 | Z. Zhang, H. K. Wayment-Steele, G. Brixi, H. Wang, D. Kern, and S. Ovchinnikov. Protein language models learn evolutionary statistics of interacting sequence motifs. Proceedings of the National Academy of Sciences, 121(45):e2406285121, 2024. | ||
2160 | 36 | 0 | Z. Zhou, Y. Ji, W. Li, P. Dutta, R. V. Davuluri, and H. Liu. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. 12th International Conference on Learning Representations, ICLR 2024, 6 2023. URL https://arxiv.org/pdf/2306.15006. | ||
612 | 12 | 29 | Zaheer, M. et al. Big bird: Transformers for longer sequences. https://doi.org/10.48550/ARXIV.2007.14062 (2020). | ||
62 | 3 | 54 | Zaheer, M., Guruganesh, G., Dubey, K. A., Ainslie, J., Alberti, C., Ontanon, S., Pham, P., Ravula, A., Wang, Q., Yang, L., and Ahmed, A. Big Bird: Transformers for Longer Sequences. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., and Lin, H., eds. Advances in Neural Information Processing Systems vol. 33. Curran Associates, Inc. (2020):(17283–17297). | ||
2147 | 35 | 420 | Zaixiang Zheng, Yifan Deng, Dongyu Xue, Yi Zhou, Fei Ye, and Quanquan Gu. 2023. Structure-informed Language Models Are Protein Designers. bioRxiv (2023). | ||
698 | 13 | 53 | Zambrano Chaves, J., Huang, S.-C., Xu, Y., Xu, H., Usuyama, N., and Zhang, S, e. a. Towards a clinically accessible radiology foundation model: open-access and lightweight, with automated evaluation. arXiv preprint arXiv:2403.08002, 2024. | ||
1934 | 35 | 207 | Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision. 10012–10022. | ||
1779 | 35 | 52 | Zeming Chen, Alejandro Hernández Cano, Angelika Romanou, Antoine Bonnet, Kyle Matoba, Francesco Salvi, Matteo Pagliardini, Simin Fan, Andreas Köpf, Amirkeivan Mohtashami, Alexandre Sallinen, Alireza Sakhaeirad, Vinitra Swamy, Igor Krawczuk, Deniz Bayazit, Axel Marmet, Syrielle Montariol, Mary-Anne Hartley, Martin Jaggi, and Antoine Bosselut. 2023. MEDITRON-70B: Scaling Medical Pretraining for Large Language Models. arXiv:2311.16079 [cs.CL] | ||
1921 | 35 | 194 | Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Sal Candido, et al. 2022. Language models of protein sequences at the scale of evolution enable accurate structure prediction. BioRxiv 2022 (2022), 500902. | ||
1089 | 23 | 35 | Zeming Lin, Halil Akin, Roshan Rao, Brian Hie, Zhongkai Zhu, Wenting Lu, Nikita Smetanin, Robert Verkuil, Ori Kabeli, Yaniv Shmueli, Allan dos Santos Costa, Maryam Fazel-Zarandi, Tom Sercu, Salvatore Candido, and Alexander Rives. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 379(6637):1123–1130, March 2023. ISSN 0036-8075, 1095-9203. doi: 10.1126/science.ade2574. URL https://www.science.org/doi/10.1126/science.ade2574. | ||
51 | 3 | 43 | Zeng, T., and Li, Y. I. (2022). Predicting RNA splicing from DNA sequence using Pangolin. Genome Biology 23, 103. https://genomebiology.biomedcentral.com/articles/10.1186/s13059-022-02664-4. doi:10.1186/s13059-022-02664-4. | ||
175 | 4 | 35 | Zeng, T., Spence, J. P., Mostafavi, H. & Pritchard, J. K.Bayesian estimation of gene constraint from an evolutionary model with gene features. Nat. Genet. 56, 1632–1643 (2024). | ||
1936 | 35 | 209 | Zequn Liu, Wei Zhang, Yingce Xia, Lijun Wu, Shufang Xie, Tao Qin, Ming Zhang, and Tie-Yan Liu. 2023. MolXPT: Wrapping Molecules with Text for Generative Pre-training. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 1606–1616. | ||
739 | 15 | 3 | Zeyuan Allen-Zhu and Yuanzhi Li. Backward feature correction: How deep learning performs deep learning. arXiv preprint arXiv:2001.04413, 2020a. | ||
740 | 15 | 4 | Zeyuan Allen-Zhu and Yuanzhi Li. Feature purification: How adversarial training performs robust deep learning. arXiv preprint arXiv:2005.10190, 2020b. | ||
738 | 15 | 2 | Zeyuan Allen-Zhu and Yuanzhi Li. What Can ResNet Learn Efficiently, Going Beyond Kernels? In NeurIPS, 2019. Full version available at http://arxiv.org/abs/1905.10337. | ||
741 | 15 | 5 | Zeyuan Allen-Zhu, Yuanzhi Li, and Zhao Song. A convergence theory for deep learning via overparameterization. In ICML, 2019. Full version available at http://arxiv.org/abs/1811.03962. | ||
2075 | 35 | 348 | Zeyuan Wang, Qiang Zhang, Keyan Ding, Ming Qin, Xiang Zhuang, Xiaotong Li, and Huajun Chen. 2023. InstructProtein: Aligning Human and Protein Language via Knowledge Instruction. arXiv preprint:2310.03269 (2023). | ||
2076 | 35 | 349 | Zeyuan Wang, Qiang Zhang, Shuang-Wei HU, Haoran Yu, Xurui Jin, Zhichen Gong, and Huajun Chen. 2023. Multi-level Protein Structure Pre-training via Prompt Learning. In The Eleventh International Conference on Learning Representations. | ||
23 | 3 | 15 | Zhai, J., Gokaslan, A., Schiff, Y., Berthel, A., Liu, Z.-Y., Miller, Z. R., Scheben, A., Stitzer, M. C., Romay, C., Buckler, E. S., and Kuleshov, V. (2024). Cross-species plant genomes modeling at single nucleotide resolution using a pre-trained DNA language model. bioRxiv preprint. https://www.biorxiv.org/content/early/2024/06/05/2024.06.04.596709. doi:10.1101/2024.06.04.596709. | ||
1801 | 35 | 74 | Zhanbei Cui, Yu Liao, Tongda Xu, and Yan Wang. 2022. Geneformer: Learned gene compression using transformer-based context modeling. arXiv preprint arXiv:2212.08379 (2022). | ||
1022 | 22 | 105 | Zhang J, Chen Q, Liu B. NCBRPred: predicting nucleic acid binding residues in proteins based on multilabel learning. Brief Bioinform 2021;22:bbaa397. 10.1093/bib/bbaa397. | ||
1752 | 35 | 25 | Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, et al. 2024. INDUS: Effective and Efficient Language Models for Scientific Applications. arXiv preprint arXiv:2405.10725 (2024). | ||
77 | 3 | 69 | Zhang, D., Zhang, W., He, B., Zhang, J., Qin, C., and Yao, J. (2023). DNAGPT: A generalized pretrained tool for multiple DNA sequence analysis tasks. bioRxiv preprint. https://arxiv.org/abs/2307.05628. | ||
324 | 7 | 12 | Zhang, F. & Lupski, J. R. Non-coding genetic variants in human disease. Hum. Mol. Genet. 24, R102–R110 (2015). | ||
725 | 14 | 24 | Zhang, H. & Shafiq, M. O. Survey of transformers and towards ensemble learning using transformers for natural language processing. Journal of Big Data 11, 25 (2024). | ||
722 | 14 | 21 | Zhang, P., Zhang, H. & Wu, H. iPro-WAEL: a comprehensive and robust framework for identifying promoters in multiple species. Nucleic Acids Research 50, 10278–10289 (2022). | ||
66 | 3 | 58 | Zhang, Y., An, L., Yue, F., and Hardison, R. C. (2016). Jointly characterizing epigenetic dynamics across multiple human cell types. Nucleic Acids Research 44, 6721–6731. | ||
116 | 3 | 108 | Zhang, Y., Tiˇno, P., Leonardis, A., and Tang, K. (2021). A survey on neural network inter-pretability. IEEE Transactions on Emerging Topics in Computational Intelligence 5, 726–742. | ||
2102 | 35 | 375 | Zhao Xu, Youzhi Luo, Xuan Zhang, Xinyi Xu, Yaochen Xie, Meng Liu, Kaleb Dickerson, Cheng Deng, Maho Nakata, and Shuiwang Ji. 2021. Molecule3d: A benchmark for predicting 3d geometries from molecular graphs. arXiv preprint arXiv:2110.0171(2021). | ||
384 | 7 | 72 | Zhao, H. et al. CrossMap: a versatile tool for coordinate conversion between genome assemblies. Bioinformatics 30, 1006–1007 (2014). | ||
699 | 13 | 54 | Zhao, H., Shi, J., Qi, X., Wang, X., and Jia, J. Pyramid scene parsing network, 2017. | ||
700 | 13 | 55 | Zhao, P., Jin, Z., and Cheng, N. An in-depth survey of large language model-based artificial intelligence agents. arXiv preprint arXiv:2309.14365, 2023. | ||
769 | 15 | 33 | Zhaojiang Lin, Andrea Madotto, and Pascale Fung. Exploring versatile generative language model via parameter-efficient transfer learning. In Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 441–459, Online, November 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.41. URL https://aclanthologyorg/2020.findings-emnlp.41. | ||
1765 | 35 | 38 | Zheng Cai, Maosong Cao, Haojiong Chen, Kai Chen, Keyu Chen, Xin Chen, Xun Chen, Zehui Chen, Zhi Chen, Pei Chu, et al. 2024. Internlm2 technical report. arXiv preprint arXiv:2403.17297 (2024). | ||
1334 | 26 | 97 | Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou. Scaling relationship on learning mathematical reasoning with large language models. CoRR, abs/2308.01825, 2023. | ||
1619 | 33 | 75 | Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou. Scaling relationship on learning mathematical reasoning with large language models. CoRR, abs/2308.01825, 2023. | ||
1720 | 34 | 95 | Zheng Yuan, Hongyi Yuan, Chengpeng Li, Guanting Dong, Chuanqi Tan, and Chang Zhou. Scaling relationship on learning mathematical reasoning with large language models. CoRR, abs/2308.01825, 2023. | ||
1541 | 32 | 48 | Zheng, S., Rao, J., Zhang, Z., Xu, J. & Yang, Y. Predicting retrosynthetic reactions using self-corrected transformer neural networks. Journal of chemical information and modeling 60, 47–55 (2019). | ||
2051 | 35 | 324 | Zhengkai Tu and Connor W Coley. 2022. Permutation invariant graph-to-sequence model for template-free retrosynthesis and reaction prediction. Journal of chemical information and modeling 62, 15 (2022), 3503–3513. | ||
1811 | 35 | 84 | Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, and Jie Tang. 2021. Glm: General language model pretraining with autoregressive blank infilling. arXiv preprint arXiv:2103.10360 (2021). | ||
2121 | 35 | 394 | Zheni Zeng, Bangchen Yin, Shipeng Wang, Jiarui Liu, Cheng Yang, Haishen Yao, Xingzhi Sun, Maosong Sun, Guotong Xie, and Zhiyuan Liu. 2023. Interactive Molecular Discovery with Natural Language. arXiv:2306.11976 [cs.CL] | ||
2120 | 35 | 393 | Zheni Zeng, Yuan Yao, Zhiyuan Liu, and Maosong Sun. 2022. A deep-learning system bridging molecule structure and biomedical text with comprehension comparable to human professionals. Nature communications 13, 1 (2022), 862. | ||
2086 | 35 | 359 | Zhenqin Wu, Bharath Ramsundar, Evan N Feinberg, Joseph Gomes, Caleb Geniesse, Aneesh S Pappu, Karl Leswing, and Vijay Pande. 2018. MoleculeNet: a benchmark for molecular machine learning. Chemical science 9, 2 (2018), 513–530. | ||
1902 | 35 | 175 | Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representations. arXiv preprint arXiv:1909.11942 (2019). | ||
1456 | 31 | 7 | Zhenzhong Lan, Mingda Chen, Sebastian Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. Albert: A lite bert for self-supervised learning of language representations. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=H1eA7AEtvS. | ||
1859 | 35 | 132 | Zhi Hong, Aswathy Ajith, Gregory Pauloski, Eamon Duede, Kyle Chard, and Ian Foster. 2023. The Diminishing Returns of Masked Language Models to Science. arXiv:2205.1134[cs.CL] | ||
1845 | 35 | 118 | Zhichun Guo, Kehan Guo, Bozhao Nan, Yijun Tian, Roshni G Iyer, Yihong Ma, Olaf Wiest, Xiangliang Zhang, Wei Wang, Chuxu Zhang, et al. 2023. Graph-based molecular representation learning. IJCAI (2023). | ||
2153 | 35 | 426 | Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. 2023. Dnabert-2: Efficient foundation model and benchmark for multi-species genome. arXiv preprint arXiv:2306.15006 (2023). | ||
1116 | 23 | 62 | Zhihan Zhou, Yanrong Ji, Weijian Li, Pratik Dutta, Ramana Davuluri, and Han Liu. DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome, June 2023. URL http://arxiv.org/abs/2306.15006. arXiv:2306.15006 [cs, q-bio]. | ||
1467 | 31 | 18 | Zhiheng Huang, Davis Liang, Peng Xu, and Bing Xiang. Improve transformer models with better relative position embeddings. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3327–3335, Online, November 2020. Association for Computational Linguistics. doi:10.18653/v1/2020.findings-emnlp.298. URL https://www.aclweb.org/anthology/2020.findings-emnlp.298. | ||
1559 | 33 | 15 | Zhihong Chen, Shuo Yan, Juhao Liang, Feng Jiang, Xiangbo Wu, Fei Yu, Guiming Hardy Chen, Junying Chen, Hongbo Zhang, Li Jianquan, Wan Xiang, and Benyou Wang. Multilingual- SIFT: Multilingual supervised instruction fine-tuning, 2023b. URL https://github.com/FreedomIntelligence/MultilingualSIFT. | ||
1254 | 26 | 17 | Zhihong Chen, Shuo Yan, Juhao Liang, Feng Jiang, Xiangbo Wu, Fei Yu, Guiming Hardy Chen, Junying Chen, Hongbo Zhang, Li Jianquan, Wan Xiang, and Benyou Wang. MultilingualSIFT: Multilingual supervised instruction fine-tuning, 2023b. URL https://github.com/FreedomIntelligence/MultilingualSIFT. | ||
1642 | 34 | 17 | Zhihong Chen, Shuo Yan, Juhao Liang, Feng Jiang, Xiangbo Wu, Fei Yu, Guiming Hardy Chen, Junying Chen, Hongbo Zhang, Li Jianquan, Wan Xiang, and Benyou Wang. MultilingualSIFT: Multilingual supervised instruction fine-tuning, 2023b. URL https://github.com/FreedomIntelligence/MultilingualSIFT. | ||
1314 | 26 | 77 | Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. CoRR, abs/2402.03300, 2024. | ||
1700 | 34 | 75 | Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Mingchuan Zhang, Y. K. Li, Y. Wu, and Daya Guo. Deepseekmath: Pushing the limits of mathematical reasoning in open language models. CoRR, abs/2402.03300, 2024. | ||
1324 | 26 | 87 | Zhilin Wang, Alexander Bukharin, Olivier Delalleau, Daniel Egert, Gerald Shen, Jiaqi Zeng, Oleksii Kuchaiev, and Yi Dong. HelpSteer2-Preference: Complementing ratings with preferences. CoRR, abs/2410.01257, 2024c. | ||
1710 | 34 | 85 | Zhilin Wang, Alexander Bukharin, Olivier Delalleau, Daniel Egert, Gerald Shen, Jiaqi Zeng, Oleksii Kuchaiev, and Yi Dong. HelpSteer2-Preference: Complementing ratings with preferences. CoRR, abs/2410.01257, 2024c. | ||
2148 | 35 | 421 | Zhiling Zheng, Oufan Zhang, Christian Borgs, Jennifer T. Chayes, and Omar M. Yaghi. 2023. ChatGPT Chemistry Assistant for Text Mining and the Prediction of MOF Synthesis. Journal of the American Chemical Society 145, 3(aug 2023), 18048–18062. | ||
1935 | 35 | 208 | Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, and Tat-Seng Chua. 2024. ProtT3: Protein-to-Text Generation for Text-based Protein Understanding. arXiv preprint arXiv:2405.12564 (2024). | ||
1918 | 35 | 191 | Zhongshen Li, Junru Jin, Wentao Long, and Leyi Wei. 2023. PLPMpro: Enhancing promoter sequence prediction with prompt-learning based pre-trained language model. Computers in Biology and Medicine 164 (2023), 107260. | ||
2181 | 36 | 21 | Zhongxiao Li, Elva Gao, Juexiao Zhou, Wenkai Han, Xiaopeng Xu, and Xin Gao. Applications of deep learning in understanding gene regulation. Cell Reports Methods, 3(1):100384, 2023. ISSN 2667-2375. doi: https://doi.org/10.1016/j.crmeth.2022.100384. URL https://www.sciencedirect.com/science/article/pii/S2667237522002892. | ||
1014 | 22 | 97 | Zhou G, Gao Z, Ding Q. et al. Uni-Mol: A universal 3d molecular representation learning framework. chemrxiv, 2023. | ||
1043 | 22 | 126 | Zhou H, Zhang S, Peng J. et al. Informer: beyond efficient transformer for long sequence time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence 2021;35:11106–15. 10.1609/aaai.v35i12.17325. | ||
1001 | 22 | 84 | Zhou Z, Ji Y, Li W. et al. DNABERT-2: efficient foundation model and benchmark for multi-species genome. arXiv preprint arXiv:2306.15006, 2023. | ||
432 | 9 | 10 | Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12, 931–934 (2015). | ||
492 | 10 | 10 | Zhou, J. & Troyanskaya, O. G. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12, 931–934 (2015). | ||
436 | 9 | 14 | Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018). | ||
496 | 10 | 14 | Zhou, J. et al. Deep learning sequence-based ab initio prediction of variant effects on expression and disease risk. Nat. Genet. 50, 1171–1179 (2018). | ||
48 | 3 | 40 | Zhou, J., and Troyanskaya, O. G. (2015). Predicting effects of noncoding variants with deep learning–based sequence model. Nature Methods 12, 931–934. | ||
623 | 12 | 40 | Zhou, M. et al. Non-optimal codon usage affects expression, structure and function of clock protein FRQ. Nature 495, 111–115 (2013). | ||
622 | 12 | 39 | Zhou, M., Wang, T., Fu, J., Xiao, G. & Liu, Y. Nonoptimal codon usage influences protein structure in intrinsically disordered regions. Mol. Microbiol. 97, 974–987 (2015). | ||
621 | 12 | 38 | Zhou, T., Weems, M. & Wilke, C. O. Translationally optimal codons associate with structurally sensitive sites in proteins. Mol. Biol. Evol. 26, 1571–1580 (2009). | ||
420 | 8 | 21 | Zhou, Z. et al. DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome. (2023). | ||
445 | 9 | 23 | Zhou, Z. et al. Dnabert-2: efficient foundation model and benchmark for multi-species genome. in Proceedings of the Twelfth International Conference on Learning Representations https://openreview.net/pdf?id=oMLQB4EZE1 (ICLR, 2024). | ||
505 | 10 | 23 | Zhou, Z. et al. Dnabert-2: efficient foundation model and benchmark for multi-species genome. in Proceedings of the Twelfth International Conference on Learning Representations https://openreview.net/pdf?id=oMLQB4EZE1 (ICLR, 2024). | ||
711 | 14 | 10 | Zhou, Z. et al. DNABERT-2: Efficient Foundation Model and Benchmark For Multi-Species Genome. Preprint at https://doi.org/10.48550/arXiv.2306.15006 (2024). | ||
57 | 3 | 49 | Zhou, Z., Ji, Y., Li, W., Dutta, P., Davuluri, R., and Liu, H. (2023). DNABERT-2: Efficient foundation model and benchmark for multi-species genome. arXiv preprint arXiv:2306.15006. https://arxiv.org/abs/2306.15006. | ||
56 | 3 | 48 | Zhou, Z., Wu, W., Ho, H., Wang, J., Shi, L., Davuluri, R. V., Wang, Z., and Liu, H. (2024). DNABERT-S: Learning species-aware dna embedding with genome foundation models. arXiv preprint. https://arxiv.org/abs/2402.08777. | ||
1842 | 35 | 115 | Zhouhong Gu, Xiaoxuan Zhu, Haoning Ye, Lin Zhang, Jianchen Wang, Sihang Jiang, Zhuozhi Xiong, Zihan Li, Qianyu He, Rui Xu, Wenhao Huang, Zili Wang, Shusen Wang, Weiguo Zheng, Hongwei Feng, and Yanghua Xiao. 2023. Xiezhi: An Ever-Updating Benchmark for Holistic Domain Knowledge Evaluation. arXiv:2306.05783 [cs.CL] | ||
916 | 21 | 38 | Zhu H. et al. (2018) Clinical concept extraction with contextual word embedding. NIPS Machine Learning for Health Workshop. http://par.nsf.gov/biblio/10098080. | ||
1021 | 22 | 104 | Zhu H, Hu J, Song XN. et al. DNAPred: accurate identification of DNA-binding sites from protein sequence by ensembled hyperplane-distance-based support vector machines. J Chem Inf Model 2019;59:3057–71. 10.1021/acs.jcim.8b00749. | ||
988 | 22 | 71 | Zhu Y, Kiros R, Zemel R. et al. Aligning books and movies: towards story-like visual explanations by watching movies and reading books. arXiv preprint arXiv:1506.06724, 2015. | ||
377 | 7 | 65 | Zhu, P. et al. Single-cell DNA methylome sequencing of human preimplantation embryos. Nat. Genet. 50, 12–19 (2018). | ||
86 | 3 | 78 | Zhu, X., Qin, C., Wang, F., Yang, F., He, B., Zhao, Y., and Yao, J. (2024). CD-GPT: A Biological Foundation Model Bridging the Gap between Molecular Sequences Through Central Dogma. bioRxiv preprint. https://www.biorxiv.org/content/10.1101/2024.06.24.600337v1. | ||
1472 | 31 | 23 | Zhuoran Shen, Mingyuan Zhang, Haiyu Zhao, Shuai Yi, and Hongsheng Li. Efficient attention: Attention with linear complexities. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 3531–3539, 2021. | ||
2073 | 35 | 346 | Zichen Wang, Steven A Combs, Ryan Brand, Miguel Romero Calvo, Panpan Xu, George Price, Nataliya Golovach, Emmanuel O Salawu, Colby J Wise, Sri Priya Ponnapalli, et al. 2022. Lm-gvp: an extensible sequence and structure informed deep learning framework for protein property prediction. Scientific reports 12, 1 (2022), 6832. | ||
1933 | 35 | 206 | Zicheng Liu, Jiahui Li, Siyuan Li, Zelin Zang, Cheng Tan, Yufei Huang, Yajing Bai, and Stan Z Li. 2024. GenBench: A Benchmarking Suite for Systematic Evaluation of Genomic Foundation Models. arXiv preprint arXiv:2406.01627 (2024). | ||
2074 | 35 | 347 | Zifeng Wang, Zichen Wang, Balasubramaniam Srinivasan, Vassilis N Ioannidis, Huzefa Rangwala, and Rishita Anubhai. 2023. BioBridge: Bridging Biomedical Foundation Models via Knowledge Graph. arXiv preprint arXiv:2310.03320 (2023). | ||
1737 | 35 | 10 | Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R Ledsam, Agnieszka Grabska-Barwinska, Kyle R Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R Kelley. 2021. Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods 18, 10 (2021), 1196–1203. | ||
2161 | 36 | 1 | Žiga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R Ledsam, Agnieszka Grabska-Barwinska, Kyle R Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R Kelley. Effective gene expression prediction from sequence by integrating long-range interactions. Nature methods, 18 (10):1196–1203, 2021. | ||
1060 | 23 | 6 | Ziga Avsec, Vikram Agarwal, Daniel Visentin, Joseph R. Ledsam, Agnieszka Grabska-Barwinska, Kyle R. Taylor, Yannis Assael, John Jumper, Pushmeet Kohli, and David R. Kelley. Effective gene expression prediction from sequence by integrating long-range interactions. Nature Methods, 18(10):1196–1203, October 2021. ISSN 1548-7105. doi: 10.1038/s41592-021-01252-x. URL https://www.nature.com/articles/s41592-021-01252-x. Number: 10 Publisher: Nature Publishing Group. | ||
2144 | 35 | 417 | Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, et al. 2024. Chemdfm: Dialogue foundation model for chemistry. arXiv preprint arXiv:2401.14818 (2024). | ||
1462 | 31 | 13 | Zihang Dai, Z. Yang, Yiming Yang, J. Carbonell, Quoc V. Le, and R. Salakhutdinov. Transformer-xl: Attentive language models beyond a fixed-length context. In ACL, 2019. | ||
1836 | 35 | 109 | Zijing Gao, Qiao Liu, Wanwen Zeng, Wing H Wong, and Rui Jiang. 2023. EpiGePT: a Pretrained Transformer model for epigenomics. bioRxiv (2023), 2023–07. | ||
1283 | 26 | 46 | Zixuan Jiang, Jiaqi Gu, Hanqing Zhu, and David Z. Pan. Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and efficient pre-LN Transformers. CoRR, abs/2305.14858, 2023b. | ||
1577 | 33 | 33 | Zixuan Jiang, Jiaqi Gu, Hanqing Zhu, and David Z. Pan. Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and efficient pre-LN Transformers. CoRR, abs/2305.14858, 2023b. | ||
1670 | 34 | 45 | Zixuan Jiang, Jiaqi Gu, Hanqing Zhu, and David Z. Pan. Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and efficient pre-LN Transformers. CoRR, abs/2305.14858, 2023b. | ||
2200 | 36 | 40 | Zixuan Wang, Yongqing Zhang, Yuhang Liu, Shuwen Xiong, Maocheng Wang, Jiliu Zhou, and Meiqin Gong. Towards a better understanding of tf-dna binding prediction from genomic features. Computers in Biology and Medicine, pp. 105993, 2022. | ||
935 | 22 | 18 | Zou J, Huss M, Abid A. et al. A primer on deep learning in genomics. Nat Genet 2019;51:12–8. 10.1038/s41588-018-0295-5. | ||
470 | 9 | 48 | Zou, J. et al. A primer on deep learning in genomics. Nat. Genetics 51, 12–18 (2019). | ||
530 | 10 | 48 | Zou, J. et al. A primer on deep learning in genomics. Nat. Genetics 51, 12–18 (2019). | ||
2139 | 35 | 412 | Zuobai Zhang, Chuanrui Wang, Minghao Xu, Vijil Chenthamarakshan, Aurélie Lozano, Payel Das, and Jian Tang. 2023. A Systematic Study of Joint Representation Learning on Protein Sequences and Structures. arXiv:2303.06275 [q-bio.QM] | ||
2140 | 35 | 413 | Zuobai Zhang, Minghao Xu, Vijil | ||