More than 5 years have passed since last update.

Elasticsearch 6.2.4 で join datatype を使って親子関係（旧 Parent-Child ）のデータをインデクシングする

Elasticsearch

Posted at 2018-04-23

Elasticsearch で親子関係のデータをインデクシングして検索します。
内容としては以下のドキュメントをまとめたものになります。
join datatype

手元の環境は以下になります。

Ubuntu 16.04 LTS
Elasticsearch 6.2.4
ICU Analysis Plugin、Japanese (kuromoji) Analysis Plugin

Elasticsearch 6系から、インデックスにはひとつのタイプ（ _type で指定していたもの）となりました。
別々の _type で同じフィールドを設定した場合、最終的に Lucene のインデックスで処理する際に非効率になっていたようです。
そのため、親子関係のデータについても実現方法が変更になっています。

最初にインデックスの設定とマッピングを定義します。( jq コマンドで整形してます。)
今回は goods データに対して review データが複数存在するような状態を作ってみます。

$ curl -X PUT 'localhost:9200/shop' -H 'Content-Type: application/json' -d'
{
    "settings": {
        "index": {
            "number_of_shards" : 1,
            "number_of_replicas" : 0,
            "refresh_interval" : "-1",
            "analysis": {
                "tokenizer": {
                    "shop_tokenizer": {
                        "type": "kuromoji_tokenizer",
                        "mode": "search",
                        "discard_punctuation": "true",
                        "user_dictionary": "userdict_ja.txt"
                    }
                },
                "analyzer": {
                    "my_analyzer": {
                        "type": "custom",
                        "tokenizer": "shop_tokenizer",
                        "char_filter": [
                            "icu_normalizer",
                            "kuromoji_iteration_mark"
                        ],
                        "filter": [
                            "kuromoji_baseform",
                            "kuromoji_part_of_speech"
                        ]
                    }
                }
            }
        }
    },
    "mappings": {
        "_doc": {
            "_all": {"enabled" : false},
            "properties": {
                "type": {"type": "keyword"},
                "name": {"type": "text", "analyzer":"my_analyzer"},
                "text": {"type": "text", "analyzer":"my_analyzer"},
                "user": {"type": "keyword"},
                "my_join_field": {
                    "type": "join",
                    "relations": {
                        "goods": "review"
                    }
                }
            }
        }
    }
}
' | jq

ポイントは以下になります。

type フィールド（ _type ではありません）に goods や review を指定することでデータ種類の判別ができるようします。
join 用のフィールドを作成し、relations で親子関係を設定します。（配列を指定することで複数の子供を表現したり、複数の relation を設定することも可能なようです。）

次に、以下のようなデータで json ファイルを作成します。

{"index": {"_index": "shop", "_type": "_doc","_id": "goods_id.1"}}
{"type": "goods","name": "三岳","text": "もののけ姫のモデルにもなった、鹿児島県屋久島の焼酎です。屋久島は水が美味しく、その水を使って作られています。ロックで飲むことをオススメします。","my_join_field": "goods"}
{"index": {"_index": "shop", "_type": "_doc","_id": "review_id.1", "routing": "goods_id.1"}}
{"type": "review","user": "nettle0010","text": "何度も購入させていただいてます。以前はレア物でなかなか飲めませんでしたが、こちらで購入できるようになって晩酌で楽しんでます。","my_join_field": {"name": "review", "parent": "goods_id.1"}}
{"index": {"_index": "shop", "_type": "_doc","_id": "review_id.2", "routing": "goods_id.1"}}
{"type": "review","user": "udfhsudadb","text": "注文してから２〜３日で届きました。ありがとうございます！","my_join_field": {"name": "review", "parent": "goods_id.1"}}
{"index": {"_index": "shop", "_type": "_doc","_id": "goods_id.2"}}
{"type": "goods","name": "屋久の島","text": "もののけ姫のモデルにもなった、鹿児島県屋久島の焼酎です。三岳と違ってあまりメジャーではありませんが、なかなか美味しいです。ロックがオススメです。","my_join_field": "goods"}
{"index": {"_index": "shop", "_type": "_doc","_id": "review_id.3", "routing": "goods_id.2"}}
{"type": "review","user": "okodshywegfej","text": "昨年購入しました。両親からも喜ばれています。","my_join_field": {"name": "review", "parent": "goods_id.2"}}

ポイントは以下になります。

子供のデータ（今回は review ）には routing で親のデータ（今回は goods ）の _id を設定します。
親と子供のそれぞれのデータで join 用のフィールド（今回は my_join_field ）を設定します。
子供のデータの join 用のフィールドで親のデータの _id を指定します。

ここで json ファイルを使ってインデクシングします。

$ curl -H "Content-type: application/x-ndjson" -X POST http://localhost:9200/_bulk?refresh=false --data-binary @request_bulk.json | jq
$ curl -X POST 'localhost:9200/shop/_refresh' | jq

登録したドキュメントを検索してみます。
まずは Parent Id Query を使って親の _id から子供のデータを取得してみます。

$ curl -X POST 'localhost:9200/shop/_search' -H 'Content-Type: application/json' -d'
{
    "query": {
        "parent_id": {
            "type": "review",
            "id": "goods_id.1"
         }
    }
}
' | jq

すると、以下のような結果が返ってきます。

{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 0.5389965,
    "hits": [
      {
        "_index": "shop",
        "_type": "_doc",
        "_id": "review_id.1",
        "_score": 0.5389965,
        "_routing": "goods_id.1",
        "_source": {
          "type": "review",
          "user": "nettle0010",
          "text": "何度も購入させていただいてます。以前はレア物でなかなか飲めませんでしたが、こちらで購入できるようになって晩酌で楽しんでます。",
          "my_join_field": {
            "name": "review",
            "parent": "goods_id.1"
          }
        }
      },
      {
        "_index": "shop",
        "_type": "_doc",
        "_id": "review_id.2",
        "_score": 0.5389965,
        "_routing": "goods_id.1",
        "_source": {
          "type": "review",
          "user": "udfhsudadb",
          "text": "注文してから２〜３日で届きました。ありがとうございます！",
          "my_join_field": {
            "name": "review",
            "parent": "goods_id.1"
          }
        }
      }
    ]
  }
}

今度は Has Child Query を使って子供のデータから親のデータを取得してみます。
inner_hits を設定することで子供の情報も付けて返してくれます。

$ curl -X POST 'localhost:9200/shop/_search' -H 'Content-Type: application/json' -d'
{
    "query": {
        "has_child": {
            "type": "review",
            "query": {
                "match": {
                    "text": "購入"
                 }
            },
            "inner_hits": {}
         }
    }
}
' | jq

この場合は以下のような結果が返ってきます。

{
  "took": 4,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "shop",
        "_type": "_doc",
        "_id": "goods_id.1",
        "_score": 1,
        "_source": {
          "type": "goods",
          "name": "三岳",
          "text": "もののけ姫のモデルにもなった、鹿児島県屋久島の焼酎です。屋久島は水が美味しく、その水を使って作られています。ロックで飲むことをオススメします。",
          "my_join_field": "goods"
        },
        "inner_hits": {
          "review": {
            "hits": {
              "total": 1,
              "max_score": 1.1297731,
              "hits": [
                {
                  "_index": "shop",
                  "_type": "_doc",
                  "_id": "review_id.1",
                  "_score": 1.1297731,
                  "_routing": "goods_id.1",
                  "_source": {
                    "type": "review",
                    "user": "nettle0010",
                    "text": "何度も購入させていただいてます。以前はレア物でなかなか飲めませんでしたが、こちらで購入できるようになって晩酌で楽しんでます。",
                    "my_join_field": {
                      "name": "review",
                      "parent": "goods_id.1"
                    }
                  }
                }
              ]
            }
          }
        }
      },
      {
        "_index": "shop",
        "_type": "_doc",
        "_id": "goods_id.2",
        "_score": 1,
        "_source": {
          "type": "goods",
          "name": "屋久の島",
          "text": "もののけ姫のモデルにもなった、鹿児島県屋久島の焼酎です。三岳と違ってあまりメジャーではありませんが、なかなか美味しいです。ロックがオススメです。",
          "my_join_field": "goods"
        },
        "inner_hits": {
          "review": {
            "hits": {
              "total": 1,
              "max_score": 1.112344,
              "hits": [
                {
                  "_index": "shop",
                  "_type": "_doc",
                  "_id": "review_id.3",
                  "_score": 1.112344,
                  "_routing": "goods_id.2",
                  "_source": {
                    "type": "review",
                    "user": "okodshywegfej",
                    "text": "昨年購入しました。両親からも喜ばれています。",
                    "my_join_field": {
                      "name": "review",
                      "parent": "goods_id.2"
                    }
                  }
                }
              ]
            }
          }
        }
      }
    ]
  }
}

なお、Has Child Query などは使い方によってはパフォーマンスが悪くなるので、例えばひとつの製品データに対して 非常に多くの 発注データがあるような状態でなければ効果がないとのことです。

参考になった記事

Removal of mapping types
_parent field
Parent Id Query
Has Child Query
Inner hits

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up