LoginSignup
5
3

More than 3 years have passed since last update.

[elasticsearch-loader]を使ってElasticSearchにCSVをインポート

Posted at

概要

elasticsearch-loaderを使用して、CSVファイルをElasticSearchにアップロードする手順です。
Logstashよりこちらのほうが手軽に実行できます。

elasticsearch-loaderとは

データファイル(json、parquet、csv、tsv)をElasticSearchにバッチロードするためのPythonツール

GitHub

サポート環境

python/es 5.6.16 6.8.0 7.1.1
2.7 V V V
3.7 V V V

インストール

$ sudo pip install elasticsearch-loader

使用方法

このようなCSVファイルを使用します

$ cat test.csv 
id,name,age,address
01,taro,12,tokyo
02,hanako,13,kyoto
03,ichiro,16,osaka

以下のコマンドを実行してCSVファイルをElasticSearchに登録します

$ elasticsearch_loader --es-host <host:port> --index <IndexName> --type <TypeName> csv <FileName>
$ elasticsearch_loader --es-host 192.168.1.1:9200 --index student --type type csv test.csv
{'index': u'student', 'bulk_size': 500, 'http_auth': None, 'es_conn': <Elasticsearch([{u'host': u'192.168.1.1', u'port': 9200}])>, 'encoding': u'utf-8', 'keys': [], 'use_ssl': False, 'update': False, 'id_field': None, 'as_child': False, 'index_settings_file': None, 'timeout': 10.0, 'progress': False, 'ca_certs': None, 'with_retry': False, 'verify_certs': False, 'type': u'type', 'es_host': (u'192.168.1.1:9200',), 'delete': False}
  [####################################]

結果

指定したIndex,Typeが存在しない場合は自動で作成され、登録が出来ました。

$ curl -H "Content-Type: application/json" -XGET 'http://192.168.1.1:9200/student/type/_search?pretty'

{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 3,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "student",
        "_type" : "type",
        "_id" : "gECMZXABW66WYIIZTexw",
        "_score" : 1.0,
        "_source" : {
          "age" : "12",
          "address" : "tokyo",
          "id" : "01",
          "name" : "taro"
        }
      },
      {
        "_index" : "student",
        "_type" : "type",
        "_id" : "gkCMZXABW66WYIIZTexw",
        "_score" : 1.0,
        "_source" : {
          "age" : "16",
          "address" : "osaka",
          "id" : "03",
          "name" : "ichiro"
        }
      },
      {
        "_index" : "student",
        "_type" : "type",
        "_id" : "gUCMZXABW66WYIIZTexw",
        "_score" : 1.0,
        "_source" : {
          "age" : "13",
          "address" : "kyoto",
          "id" : "02",
          "name" : "hanako"
        }
      }
    ]
  }
}

Help

$ elasticsearch_loader -h
Usage: elasticsearch_loader [OPTIONS] COMMAND [ARGS]...

Options:
  -c, --config-file TEXT          Load default configuration file from esl.yml
  --bulk-size INTEGER             How many docs to collect before writing to
                                  Elasticsearch (default 500)
  --es-host TEXT                  Elasticsearch cluster entry point. (default
                                  http://localhost:9200)
  --verify-certs                  Make sure we verify SSL certificates
                                  (default false)
  --use-ssl                       Turn on SSL (default false)
  --ca-certs TEXT                 Provide a path to CA certs on disk
  --http-auth TEXT                Provide username and password for basic auth
                                  in the format of username:password
  --index TEXT                    Destination index name  [required]
  --delete                        Delete index before import? (default false)
  --update                        Merge and update existing doc instead of
                                  overwrite
  --progress                      Enable progress bar - NOTICE: in order to
                                  show progress the entire input should be
                                  collected and can consume more memory than
                                  without progress bar
  --type TEXT                     Docs type. TYPES WILL BE DEPRECATED IN APIS
                                  IN ELASTICSEARCH 7, AND COMPLETELY REMOVED
                                  IN 8.  [required]
  --id-field TEXT                 Specify field name that be used as document
                                  id
  --as-child                      Insert _parent, _routing field, the value is
                                  same as _id. Note: must specify --id-field
                                  explicitly
  --with-retry                    Retry if ES bulk insertion failed
  --index-settings-file FILENAME  Specify path to json file containing index
                                  mapping and settings, creates index if
                                  missing
  --timeout FLOAT                 Specify request timeout in seconds for
                                  Elasticsearch client
  --encoding TEXT                 Specify content encoding for input files
  --keys TEXT                     Comma separated keys to pick from each
                                  document
  -h, --help                      Show this message and exit.

Commands:
  csv
  json     FILES with the format of [{"a": "1"}, {"b": "2"}]
  parquet

以上

5
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
3