More than 5 years have passed since last update.

mapping in elasticsearch

Last updated at 2016-05-06Posted at 2016-05-06

検索エンジンを取り巻く7つの技術（前提知識）

トーカナイザ
- 検索エンジンに文書をインデックスするために、入力文書内の文を単語に分割するコンポーネント
  1. 単語単位のトークナイザ(形態素解析器)
  2. （文字）N-gram トークナイザ
  3. 単語単位＋（文字）N-gram トークナイザ
言語同定器
- 入力文字列から言語を同定する機能を提供する
単語の正規化器＆文字ノーマライザ
ストップワード
コンテンツ抽出
ランキングの調整
クローラ

※ビッグデータ処理の常識をJavaで身につける

what is mapping in elasticsearch ?

検索エンジン（ES）に、検索対象文字列を、どういったトーカナイザ（解析）で、検索可能かといった定義をすること

※意訳してみたが・・・

要は、スキーマ定義。

また、基本的には、mappingは必須ではない。

steps

mappings property 定義
analysis property 定義

mapping property

Elasticsearch のスキーマ定義では、mappings プロパティ以下で type で扱うプロパティ名（MySQLでいうカラム名）とそのデータ型、アナライザー（後述）を設定します。mappings は、MySQL でいう create table の際にカラム名とそのデータ型を指定するのに近い感じ

analysis property

filter はストップワードの指定、tokenizer は利用するトークナイザーを N-gram にするか形態素解析にするかの指定などを行い、analyzer は filter と tokenizer を組み合わせて独自のアナライザーを作成します。それぞれ、複数の filter, tokenizer, analyzer を定義することが可能

json

{...
  "analysis" :
    "filter" : ... ,
    "tokenizer" : ... ,
    "analyzer" : ... }

それぞれ、複数の filter, tokenizer, analyzer を定義することが可能

when/how should mapping ?

create index
put mapping API

以下は公式より重要そうなところを抜粋

create index

yaml

$ curl -XPUT 'http://localhost:9200/twitter/' -d '
index :
    number_of_shards : 3 
    number_of_replicas : 2 
'

json

$ curl -XPUT 'http://localhost:9200/twitter/' -d '{
    "settings" : {
        "index" : {
            "number_of_shards" : 3,
            "number_of_replicas" : 2
        }
    }
}'

※どちらもindexプロパティはなくてもいい
※各yaml/jsonを用意して食わせるでもOK

put mapping API

$ curl -XPUT 'http://localhost:9200/twitter/_mapping/tweet' -d '
{
    "tweet" : {
        "properties" : {
            "message" : {"type" : "string", "store" : true }
        }
    }
}
'

multi index

$ curl -XPUT 'http://localhost:9200/kimchy,elasticsearch/_mapping/tweet' -d '
{
    "tweet" : {
        "properties" : {
            "message" : {"type" : "string", "store" : true }
        }
    }
}
'

all option

PUT /{index}/_mapping/{type}

{index}
- blank | * | _all | glob pattern | name1, name2, …
{type}
- Name of the type to add. Must be the name of the type defined in the body.

Instead of _mapping you can also use the plural _mappings

＝＞ _mappingでも_mappingsでもOK

ref

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up