概要
elasticsearch-loaderを使用して、CSVファイルをElasticSearchにアップロードする手順です。
Logstashよりこちらのほうが手軽に実行できます。
elasticsearch-loaderとは
データファイル(json、parquet、csv、tsv)をElasticSearchにバッチロードするためのPythonツール
サポート環境
python/es | 5.6.16 | 6.8.0 | 7.1.1 |
---|---|---|---|
2.7 | V | V | V |
3.7 | V | V | V |
インストール
$ sudo pip install elasticsearch-loader
使用方法
このようなCSVファイルを使用します
$ cat test.csv
id,name,age,address
01,taro,12,tokyo
02,hanako,13,kyoto
03,ichiro,16,osaka
以下のコマンドを実行してCSVファイルをElasticSearchに登録します
$ elasticsearch_loader --es-host <host:port> --index <IndexName> --type <TypeName> csv <FileName>
$ elasticsearch_loader --es-host 192.168.1.1:9200 --index student --type type csv test.csv
{'index': u'student', 'bulk_size': 500, 'http_auth': None, 'es_conn': <Elasticsearch([{u'host': u'192.168.1.1', u'port': 9200}])>, 'encoding': u'utf-8', 'keys': [], 'use_ssl': False, 'update': False, 'id_field': None, 'as_child': False, 'index_settings_file': None, 'timeout': 10.0, 'progress': False, 'ca_certs': None, 'with_retry': False, 'verify_certs': False, 'type': u'type', 'es_host': (u'192.168.1.1:9200',), 'delete': False}
[####################################]
結果
指定したIndex,Typeが存在しない場合は自動で作成され、登録が出来ました。
$ curl -H "Content-Type: application/json" -XGET 'http://192.168.1.1:9200/student/type/_search?pretty'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : 3,
"max_score" : 1.0,
"hits" : [
{
"_index" : "student",
"_type" : "type",
"_id" : "gECMZXABW66WYIIZTexw",
"_score" : 1.0,
"_source" : {
"age" : "12",
"address" : "tokyo",
"id" : "01",
"name" : "taro"
}
},
{
"_index" : "student",
"_type" : "type",
"_id" : "gkCMZXABW66WYIIZTexw",
"_score" : 1.0,
"_source" : {
"age" : "16",
"address" : "osaka",
"id" : "03",
"name" : "ichiro"
}
},
{
"_index" : "student",
"_type" : "type",
"_id" : "gUCMZXABW66WYIIZTexw",
"_score" : 1.0,
"_source" : {
"age" : "13",
"address" : "kyoto",
"id" : "02",
"name" : "hanako"
}
}
]
}
}
Help
$ elasticsearch_loader -h
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...
Options:
-c, --config-file TEXT Load default configuration file from esl.yml
--bulk-size INTEGER How many docs to collect before writing to
Elasticsearch (default 500)
--es-host TEXT Elasticsearch cluster entry point. (default
http://localhost:9200)
--verify-certs Make sure we verify SSL certificates
(default false)
--use-ssl Turn on SSL (default false)
--ca-certs TEXT Provide a path to CA certs on disk
--http-auth TEXT Provide username and password for basic auth
in the format of username:password
--index TEXT Destination index name [required]
--delete Delete index before import? (default false)
--update Merge and update existing doc instead of
overwrite
--progress Enable progress bar - NOTICE: in order to
show progress the entire input should be
collected and can consume more memory than
without progress bar
--type TEXT Docs type. TYPES WILL BE DEPRECATED IN APIS
IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
IN 8. [required]
--id-field TEXT Specify field name that be used as document
id
--as-child Insert _parent, _routing field, the value is
same as _id. Note: must specify --id-field
explicitly
--with-retry Retry if ES bulk insertion failed
--index-settings-file FILENAME Specify path to json file containing index
mapping and settings, creates index if
missing
--timeout FLOAT Specify request timeout in seconds for
Elasticsearch client
--encoding TEXT Specify content encoding for input files
--keys TEXT Comma separated keys to pick from each
document
-h, --help Show this message and exit.
Commands:
csv
json FILES with the format of [{"a": "1"}, {"b": "2"}]
parquet
以上