LoginSignup
7
4

More than 5 years have passed since last update.

SNLIデータセットの読み込み方

Posted at

SNLIとは

  • Standord Natural Language Inference1の略
  • 自然言語推論を学習するための注釈付きコーパス
  • 前提と仮説の2文書からなるペアと対応するラベル(手作業)
    • neutral: どちらとも言えない
    • contradiction: 矛盾
    • entailment: 正しい
    • -: ラベルなし
Text Judgments Hypothesis
A man inspects the uniform of a figure in some East Asian country. contradiction The man is sleeping
An older and younger man smiling. neutral Two men are smiling and laughing at the cats playing on the floor.
A black race car starts up in front of a crowd of people. contradiction A man is driving down a lonely road.
A soccer game with multiple males playing. entailment Some men are playing a sport.
A smiling costumed woman is holding an umbrella. neutral A happy woman in a fairy costume holds an umbrella.
  • データ件数: 合計57万

    • Training: 55万
    • Validation: 1万
    • Test: 1万
  • 下記形式で構文解析データもある.

{
    "annotator_labels": ["neutral"], 
    "captionID": "3416050480.jpg#4", 
    "gold_label": "neutral", 
    "pairID": "3416050480.jpg#4r1n", 
    "sentence1": "A person on a horse jumps over a broken down airplane.",
    "sentence1_binary_parse": "( ( ( A person ) ( on ( a horse ) ) ) ( ( jumps ( over ( a ( broken ( down airplane ) ) ) ) ) . ) )",
    "sentence1_parse": "(ROOT (S (NP (NP (DT A) (NN person)) (PP (IN on) (NP (DT a) (NN horse)))) (VP (VBZ jumps) (PP (IN over) (NP (DT a) (JJ broken) (JJ down) (NN airplane)))) (. .)))", 
    "sentence2": "A person is training his horse for a competition.", 
    "sentence2_binary_parse": "( ( A person ) ( ( is ( ( training ( his horse ) ) ( for ( a competition ) ) ) ) . ) )", 
    "sentence2_parse": "(ROOT (S (NP (DT A) (NN person)) (VP (VBZ is) (VP (VBG training) (NP (PRP$ his) (NN horse)) (PP (IN for) (NP (DT a) (NN competition))))) (. .)))"
}

ダウンロード

The Stanford Natural Language Inference (SNLI) Corpusよりダウンロードできます.

wget https://nlp.stanford.edu/projects/snli/snli_1.0.zip
unzip snli_1.0.zip

データの読み込み

json形式(.jsonl)とtsv形式(.txt)でデータが格納されています.

import pandas as pd
df = pd.read_csv("snli_1.0/snli_1.0_train.txt", sep="\t")

References

7
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
7
4