More than 1 year has passed since last update.

transfomerまとめ

Transformer

Last updated at 2023-12-30Posted at 2023-12-30

embedding

# pip install sentence-transformers

from sentence_transformers import SentenceTransformer

model_name = "roberta-large-nli-stsb-mean-tokens"
model = SentenceTransformer(model_name)
sentence = "Pythonはプログラミング言語です"
sentence_embedding = model.encode(sentence)

print(f"Embedding size: {len(sentence_embedding)}")

次元数について
使用するモデルによってベクトル化の次元数が異なる。次元数は多ければ多いほど細かく分類ができると思う

試したモデルと次元数
bert-base-nli-mean-tokens: 768次元
roberta-large-nli-stsb-mean-tokens:1024次元
bert-large-nli-mean-tokens: 1024次元

ちなみにopenaiのembedding apiでは1536次元

chat gptに聞いた指定できるモデルの種類(特に確かめてはいないです)

BERTベースのモデル
- bert-base-nli-mean-tokens
- bert-large-nli-mean-tokens
RoBERTaベースのモデル
- roberta-base-nli-stsb-mean-tokens
- roberta-large-nli-stsb-mean-tokens
DistilBERTベースのモデル
- distilbert-base-nli-mean-tokens
XLM-RoBERTaベースのモデル
- xlm-r-base-en-ko-nli-ststb
xlm-r-large-en-ko-nli-ststb
MiniLMベースのモデル
- paraphrase-MiniLM-L6-v2
その他のモデル
- paraphrase-distilroberta-base-v1
- paraphrase-multilingual-MiniLM-L12-v2

感情分析

from transformers import pipeline

model_name = "lxyuan/distilbert-base-multilingual-cased-sentiments-student"
classifier = pipeline("sentiment-analysis", model=model_name)
sentences = [
    "Pythonはプログラミング言語です",
    "ハムスターは可愛いです",
]
results = classifier(sentences)
for result in results:
    print(result)

# {"label": "positive", "score": 0.4850502908229828}
# {"label": "positive", "score": 0.8274434208869934}

chat gptに聞いた指定できるモデルの種類(特に確かめてはいないです)

lxyuan/distilbert-base-multilingual-cased-sentiments-student
- 多言語に対応したDistilBERTベースのモデル。
mrm8488/distilroberta-finetuned-financial-news-sentiment-analysis
-金融ニュースの感情分析にファインチューニングされたDistilRoBERTaモデル。
nlptown/bert-base-multilingual-uncased-sentiment
- 多言語に対応したBERTベースのモデル。
cardiffnlp/twitter-xlm-roberta-base-sentiment
- Twitterデータ用のXLM-RoBERTaベースのモデル。
edumunozsala/bertin_base_sentiment_analysis_es
- スペイン語テキストの感情分析用のBERTモデル。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up