Qiita 自然言語システムAdvent Calendar 2024

UIの評価 LLM課題 BERT

Last updated at 2024-12-03Posted at 2024-12-03

UI

生成AIの評価の中で、利用状況を前提としない言語評価や、
正解が一つという前提をおいた言語評価や、
３つ以上の立場の違いによる異なる方向への評価方法や、
３つ以上の立場の違いによる異なる出力をしたことに基づくそれぞれの評価方法を検討したい。

何らかの不具合があったようです。
不具合の原因は判明していません。元の画面に戻るか、支援が必要な場合には相談窓口を押してみてください。

元の画面に戻る方法を提示していない。

望ましい候補は、

次のうちのどれかで元の画面に戻ってみてください。
ブラウザの前画面へ戻る「＜」の図形を押す。
ブラウザの一つ前に表示していたURLを叩く。
ブラウザ以外のアプリの戻るの図形を押す。

keyword

BERT: Bidirectional Encoder Representations from Transformers

BERTモデルを使った日本語テキスト感情分析プログラムの実装
https://qiita.com/kiyotaman/items/736d5d0e47dbfd419244

TransformerとGPTとBERTとEncoderとDecoderの関係を整理しておく
https://qiita.com/munaita_/items/bd5513c75e18ae04c1e0

MoverScore: Text Generation Evaluating with Contextualized Embeddings and Earth Mover Distance
https://arxiv.org/abs/1909.02622

大規模言語モデルの進化と最新動向 (ver. 20240723)
https://qiita.com/compassinai/items/0be2b1139e63a4fa6f11

LLM で日本語文章校正したいメモ
https://qiita.com/syoyo/items/ab518927a51dcf03b071

僕自身のためのLLMメモまとめ
https://qiita.com/KomatsunaKinako/items/8a42c487ddcc49243b35

LLM　コード3
https://qiita.com/output_Tarou_dl/items/a41f78aa02d281981a77

LLM入門　コード１
https://qiita.com/output_Tarou_dl/items/51fbfb13975d2cd8a0f1

【論文読解メモ】LLaVA(Large Language and Vision Assistant)
https://qiita.com/LiberalArts/items/40107ba1855de509c6e3

https://qiita.com/LiberalArts/items/40107ba1855de509c6e3
https://qiita.com/aokikenichi/items/14ad95aacf2092aacbdd

LLMとゲノムバイオインフォマティクス
https://qiita.com/Yh_Taguchi/items/1f8b975182d9a33a00f7

Kaggle初めてのLLMコンペに参加しました
https://qiita.com/xxyc/items/154250c01a98aed66944

2023年末のElixirが出来ること⑤AI・LLM前編【Nx／Bumblebee】（最新Elixirのキャッチアップや、アドカレのネタ探しに読んでください）
https://qiita.com/piacerex/items/f6a73a8252a3f6093b1a

arxiv

title = BERT 1,307
https://arxiv.org/search/advanced?advanced=&terms-0-operator=AND&terms-0-term=BERT&terms-0-field=title&classification-physics_archives=all&classification-include_cross_list=include&date-filter_by=all_dates&date-year=&date-from_date=&date-to_date=&date-date_type=submitted_date&abstracts=show&size=50&order=-announced_date_first

BERT or FastText? A Comparative Analysis of Contextual as well as Non-Contextual Embeddings
https://arxiv.org/pdf/2411.17661

References

Saad Ahmed, Mahdi H Sazan, Miraz A B M Muntasir, Rahman, Saad Ahmed Sazan, Mahdi H. Miraz, and M Muntasir Rahman. 2024. Enhancing de- pressive post detection in bangla: A comparative study of tf-idf, bert and fasttext embeddings. ArXiv, https://arxiv.org/pdf/2407.09187.
Deepak Suresh Asudani, Naresh Kumar Nagwani, and Pradeep Singh. 2023. Impact of word embedding models on text analytics in deep learning environ- ment: a review. Artificial Intelligence Review, pages 1 – 81.
Ashis Kumar Chanda. 2021. Efficacy of bert embed- dings on predicting disaster from twitter data. ArXiv, https://arxiv.org/pdf/2108.10698.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: pre-training of deep bidirectional transformers for language under- standing. CoRR,https://arxiv.org/pdf/1810.04805.
Ashwin Geet D’Sa, Irina Illina, and D. Fohr. 2020. Bert and fasttext embeddings for automatic detection of
toxic speech. 2020 International Multi-Conference on: “Organization of Knowledge and Advanced Tech- nologies” (OCTA), pages 1–5.
Raviraj Joshi. 2022a. L3Cube-MahaCorpus and Ma- haBERT: Marathi monolingual corpus, Marathi BERT language models, and resources. In Proceedings of the WILDRE-6 Workshop within the 13th Language Resources and Evaluation Conference, pages 97–101, Marseille, France. European Language Resources Association.
Raviraj Joshi. 2022b. L3cube-mahanlp: Marathi natural language processing datasets, models, and library. arXiv preprint arXiv:2205.14728.
Elif Kabullar and ̇Ilker Türker. 2022. Performance com- parison of word embedding methods in text classifi- cation for various number of features.
Divyanshu Kakwani, Anoop Kunchukuttan, Satish Golla, Gokul N.C., Avik Bhattacharyya, Mitesh M. Khapra, and Pratyush Kumar. 2020. IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages. In Findings of the Association for Com- putational Linguistics: EMNLP 2020, pages 4948– 4961, Online. Association for Computational Lin- guistics.
M Moneb Khaled, Muhammad Al-Barham, Osama Ah- mad Alomari, and Ashraf Elnagar. 2023. Arabic news articles classification using different word em- beddings. In International Conference on Emerg- ing Trends and Applications in Artificial Intelligence, pages 125–136. Springer.
Asma Sadia Khan, Fariba Tasnia Khan, Tanjim Mah- mud, Salman Karim Khan, Nahed Sharmen, Moham- mad Shahadat Hossain, and Karl Andersson. 2024. Integrating bert embeddings with svm for prostate cancer prediction. In 2024 6th International Con- ference on Electrical Engineering and Information & Communication Technology (ICEEICT), pages 01– 06. IEEE.
Simran Khanuja, Diksha Bansal, Sarvesh Mehtani, Savya Khosla, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar. 2021. Muril: Multilingual representations for indian languages. Preprint, arXiv:2103.10730.
Pranav Malik, Aditi Aggrawal, and Dinesh Kumar Vish- wakarma. 2021. Toxic speech detection using tradi- tional machine learning models and bert and fasttext embedding with deep neural networks. 2021 5th In- ternational Conference on Computing Methodologies and Communication (ICCMC), pages 1254–1259.
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representa- tions in vector space. Preprint, arXiv:1301.3781.
Saloni Mittal, Vidula Magdum, Sharayu Hiwarkhedkar, Omkar Dhekane, and Raviraj Joshi. 2023. L3cube- mahanews: News-based short text and long docu- ment classification datasets in marathi. In Interna- tional Conference on Speech and Language Tech- nologies for Low-resource Languages, pages 52–63. Springer.
Abu Bakr Mohammad, Kareem Eissa, and Samhaa El- Beltagy. 2017. Aravec: A set of arabic word embed- ding models for use in arabic nlp. Procedia Com- puter Science, 117:256–265.
Hrushikesh Patil, Abhishek Velankar, and Raviraj Joshi. 2022. L3cube-mahahate: A tweet-based marathi hate speech detection dataset and bert models. In Proceed- ings of the Third Workshop on Threat, Aggression and Cyberbullying (TRAC 2022), pages 1–9.
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Empirical Methods in Natural Language Processing (EMNLP), pages 1532–1543.
Aabha Pingle, Aditya Vyawahare, Isha Joshi, Rahul Tangsali, and Raviraj Joshi. 2023. L3cube-mahasent- md: A multi-domain marathi sentiment analysis dataset and transformer models. arXiv preprint arXiv:2306.13888.
G. K. Rajput, Narinder Singh Punn, Sanjay Kumar Sonbhadra, and Sonali Agarwal. 2021. Hate speech detection using static bert embeddings. ArXiv, https://arxiv.org/pdf/2106.15537.
Sello Ralethe. 2020. Adaptation of deep bidirectional transformers for Afrikaans language. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 2475–2478, Marseille, France. Eu- ropean Language Resources Association.
Muhammad Umer, Zainab Imtiaz, Muhammad Ahmad, Michele Nappi, Carlo Maria Medaglia, Gyu Sang Choi, and Arif Mehmood. 2022. Impact of convolu- tional neural network and fasttext embedding on text classification. Multimedia Tools and Applications, 82:5569–5585.
Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-sne. Journal of Machine Learning Research, 9(86):2579–2605.
Pan Xie, Hengnian Gu, and Dongdai Zhou. 2024. Mod- eling sentiment analysis for educational texts by com- bining bert and fasttext. 2024 6th International Con- ference on Computer Science and Technologies in Education (CSTE), pages 195–199.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up