[elasticsearch-loader]を使ってElasticSearchにCSVをインポート

Posted at 2020-02-21

概要

elasticsearch-loaderを使用して、CSVファイルをElasticSearchにアップロードする手順です。
Logstashよりこちらのほうが手軽に実行できます。

elasticsearch-loaderとは

データファイル（json、parquet、csv、tsv）をElasticSearchにバッチロードするためのPythonツール

GitHub

サポート環境

python/es	5.6.16	6.8.0	7.1.1
2.7	V	V	V
3.7	V	V	V

インストール

$ sudo pip install elasticsearch-loader

使用方法

このようなCSVファイルを使用します

$ cat test.csv 
id,name,age,address
01,taro,12,tokyo
02,hanako,13,kyoto
03,ichiro,16,osaka

以下のコマンドを実行してCSVファイルをElasticSearchに登録します

$ elasticsearch_loader --es-host <host:port> --index <IndexName> --type <TypeName> csv <FileName>

$ elasticsearch_loader --es-host 192.168.1.1:9200 --index student --type type csv test.csv
{'index': u'student', 'bulk_size': 500, 'http_auth': None, 'es_conn': <Elasticsearch([{u'host': u'192.168.1.1', u'port': 9200}])>, 'encoding': u'utf-8', 'keys': [], 'use_ssl': False, 'update': False, 'id_field': None, 'as_child': False, 'index_settings_file': None, 'timeout': 10.0, 'progress': False, 'ca_certs': None, 'with_retry': False, 'verify_certs': False, 'type': u'type', 'es_host': (u'192.168.1.1:9200',), 'delete': False}
  [####################################]

結果

指定したIndex,Typeが存在しない場合は自動で作成され、登録が出来ました。

$ curl -H "Content-Type: application/json" -XGET 'http://192.168.1.1:9200/student/type/_search?pretty'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "type",
        "_id" : "gECMZXABW66WYIIZTexw",
        "_score" : 1.0,
        "_source" : {
          "age" : "12",
          "address" : "tokyo",
          "id" : "01",
          "name" : "taro"
        }
      },
      {
        "_index" : "student",
        "_type" : "type",
        "_id" : "gkCMZXABW66WYIIZTexw",
        "_score" : 1.0,
        "_source" : {
          "age" : "16",
          "address" : "osaka",
          "id" : "03",
          "name" : "ichiro"
        }
      },
      {
        "_index" : "student",
        "_type" : "type",
        "_id" : "gUCMZXABW66WYIIZTexw",
        "_score" : 1.0,
        "_source" : {
          "age" : "13",
          "address" : "kyoto",
          "id" : "02",
          "name" : "hanako"
        }
      }
    ]
  }
}

Help

$ elasticsearch_loader -h
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --config-file TEXT          Load default configuration file from esl.yml
  --bulk-size INTEGER             How many docs to collect before writing to
                                  Elasticsearch (default 500)
  --es-host TEXT                  Elasticsearch cluster entry point. (default
                                  http://localhost:9200)
  --verify-certs                  Make sure we verify SSL certificates
                                  (default false)
  --use-ssl                       Turn on SSL (default false)
  --ca-certs TEXT                 Provide a path to CA certs on disk
  --http-auth TEXT                Provide username and password for basic auth
                                  in the format of username:password
  --index TEXT                    Destination index name  [required]
  --delete                        Delete index before import? (default false)
  --update                        Merge and update existing doc instead of
                                  overwrite
  --progress                      Enable progress bar - NOTICE: in order to
                                  show progress the entire input should be
                                  collected and can consume more memory than
                                  without progress bar
  --type TEXT                     Docs type. TYPES WILL BE DEPRECATED IN APIS
                                  IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
                                  IN 8.  [required]
  --id-field TEXT                 Specify field name that be used as document
                                  id
  --as-child                      Insert _parent, _routing field, the value is
                                  same as _id. Note: must specify --id-field
                                  explicitly
  --with-retry                    Retry if ES bulk insertion failed
  --index-settings-file FILENAME  Specify path to json file containing index
                                  mapping and settings, creates index if
                                  missing
  --timeout FLOAT                 Specify request timeout in seconds for
                                  Elasticsearch client
  --encoding TEXT                 Specify content encoding for input files
  --keys TEXT                     Comma separated keys to pick from each
                                  document
  -h, --help                      Show this message and exit.

Commands:
  csv
  json     FILES with the format of [{"a": "1"}, {"b": "2"}]
  parquet

以上

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up