SNLIとは
- Standord Natural Language Inference1の略
- 自然言語推論を学習するための注釈付きコーパス
- 前提と仮説の2文書からなるペアと対応するラベル(手作業)
- neutral: どちらとも言えない
- contradiction: 矛盾
- entailment: 正しい
- -: ラベルなし
Text | Judgments | Hypothesis |
---|---|---|
A man inspects the uniform of a figure in some East Asian country. | contradiction | The man is sleeping |
An older and younger man smiling. | neutral | Two men are smiling and laughing at the cats playing on the floor. |
A black race car starts up in front of a crowd of people. | contradiction | A man is driving down a lonely road. |
A soccer game with multiple males playing. | entailment | Some men are playing a sport. |
A smiling costumed woman is holding an umbrella. | neutral | A happy woman in a fairy costume holds an umbrella. |
-
データ件数: 合計57万
- Training: 55万
- Validation: 1万
- Test: 1万
-
下記形式で構文解析データもある.
{
"annotator_labels": ["neutral"],
"captionID": "3416050480.jpg#4",
"gold_label": "neutral",
"pairID": "3416050480.jpg#4r1n",
"sentence1": "A person on a horse jumps over a broken down airplane.",
"sentence1_binary_parse": "( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) )",
"sentence1_parse": "(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))",
"sentence2": "A person is training his horse for a competition.",
"sentence2_binary_parse": "( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) )",
"sentence2_parse": "(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .)))"
}
ダウンロード
The Stanford Natural Language Inference (SNLI) Corpusよりダウンロードできます.
wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip
データの読み込み
json形式(.jsonl)とtsv形式(.txt)でデータが格納されています.
import pandas as pd
df = pd.read_csv("snli_1.0/snli_1.0_train.txt", sep="\t")
References
-
Bowman et al., A large annotated corpus for learning natural language inference, 2015. ↩