LoginSignup
10
7

More than 3 years have passed since last update.

jqコマンドでCSVをJSONに変換してElasticSearchに登録する

Last updated at Posted at 2020-02-26

1.概要

jqコマンドでCSVをJSONに変換してElasticSearchに登録した際の手順メモです。

2.環境

ElasticSearch Linux
6.5.3 Amazon Linux 2

3.手順

3.1.使用するCSV

$ cat test.csv 
id,name,age,address
01,taro,12,tokyo
02,hanako,13,kyoto
03,ichiro,16,osaka

3.2.Mappingファイル作成

上記のCSVに対応したMaapingファイルを作成します。

$ cat mapping.jq 
split("\n")|map(split(","))|
   map({"id":.[0],
        "name":.[1],
        "age":.[2],
        "address":.[3],
})

3.3.CSVをJSONに変換

結論から言うと以下のコマンドになります。
3.3.x.でコマンドの詳細を説明します。

$ jq -R -s -f mapping.jq test.csv |jq 'del(.[][] | nulls)'\
|head -n -2 | sed -e 1d -e 's/},/}/g' | jq . -c|\
sed "s/^/{ \"index\" :{} }\n/g" > test.json

3.3.1.jqコマンドを使用してCSVをJSONに変換

jq -R -s -f <mappingファイル> <CSVファイル>

--raw-input / -R
入力値は、JSONとしてパースしないでください。
その代わりに、(このオプションを用いる事で)テキストの各行は
文字列としてフィルタに渡されます。
もし --slurpオプションと組み合わせた場合
入力全体が単一の長い文字列として フィルタに渡されます。

--slurp/ -s
入力内の各JSONオブジェクトに対してフィルターを実行する代わりに、
入力ストリーム全体を大きな配列に読み取り、フィルターを1回だけ実行します。

-f filename/ --from-file filename:
awkの-fオプションのように、コマンドラインからではなくファイルからフィルターを読み取ります。
「#」を使用してコメントを作成することもできます。

--compact-output/ -c:
デフォルトでは、jqはJSON出力をプリティプリントします。
このオプションを使用すると、各JSONオブジェクトが1行に配置されるため
出力がよりコンパクトになります。

$ jq -R -s -f mapping.jq test.csv 
[
  {
    "id": "id",
    "name": "name",
    "age": "age",
    "address": "address"
  },
  {
    "id": "01",
    "name": "taro",
    "age": "12",
    "address": "tokyo"
  },
  {
    "id": "02",
    "name": "hanako",
    "age": "13",
    "address": "kyoto"
  },
  {
    "id": "03",
    "name": "ichiro",
    "age": "16",
    "address": "osaka"
  },
  {
    "id": null,
    "name": null,
    "age": null,
    "address": null
  }
]

3.3.2.なぜかnullが入ってしまうので削除

$ jq -R -s -f mapping.jq test.csv |jq 'del(.[][] | nulls)'
[
  {
    "id": "id",
    "name": "name",
    "age": "age",
    "address": "address"
  },
  {
    "id": "01",
    "name": "taro",
    "age": "12",
    "address": "tokyo"
  },
  {
    "id": "02",
    "name": "hanako",
    "age": "13",
    "address": "kyoto"
  },
  {
    "id": "03",
    "name": "ichiro",
    "age": "16",
    "address": "osaka"
  },
  {}
]

3.3.3.余計な部分を削除

$ jq -R -s -f mapping.jq test.csv |jq 'del(.[][] | nulls)'\
|head -n -2 | sed -e 1d -e 's/},/}/g' | jq . -c

{"id":"id","name":"name","age":"age","address":"address"}
{"id":"01","name":"taro","age":"12","address":"tokyo"}
{"id":"02","name":"hanako","age":"13","address":"kyoto"}
{"id":"03","name":"ichiro","age":"16","address":"osaka"}

3.3.4.ElasticSearchでBulkできるようにIndexを付与

$ jq -R -s -f mapping.jq test.csv |jq 'del(.[][] | nulls)'\
|head -n -2 | sed -e 1d -e 's/},/}/g' | jq . -c|\
sed "s/^/{ \"index\" :{} }\n/g"

{ "index" :{} }
{"id":"id","name":"name","age":"age","address":"address"}
{ "index" :{} }
{"id":"01","name":"taro","age":"12","address":"tokyo"}
{ "index" :{} }
{"id":"02","name":"hanako","age":"13","address":"kyoto"}
{ "index" :{} }
{"id":"03","name":"ichiro","age":"16","address":"osaka"}

3.3.5.上記の結果をファイルに保存


$ jq -R -s -f mapping.jq test.csv |jq 'del(.[][] | nulls)'\
|head -n -2 | sed -e 1d -e 's/},/}/g' |\ jq . -c|\
sed "s/^/{ \"index\" :{} }\n/g" > test.json

$ cat test.json
{ "index" :{} }
{"id":"id","name":"name","age":"age","address":"address"}
{ "index" :{} }
{"id":"01","name":"taro","age":"12","address":"tokyo"}
{ "index" :{} }
{"id":"02","name":"hanako","age":"13","address":"kyoto"}
{ "index" :{} }
{"id":"03","name":"ichiro","age":"16","address":"osaka"}

3.4. ElasticSearchへのBulkInsert


curl -H "Content-Type: application/json" "<ホスト名>/<Index名>/<Type名>/_bulk?pretty" --data-binary @<ファイル名>

$ curl -H "Content-Type: application/json" "localhost:9200/index/type/_bulk?pretty" --data-binary @test.json
{
  "took" : 1583,
  "errors" : false,
  "items" : [
    {
      "index" : {
        "_index" : "index",
        "_type" : "type",
        "_id" : "XWxcf3ABW0y3WGzTST46",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "index",
        "_type" : "type",
        "_id" : "Xmxcf3ABW0y3WGzTST46",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "index",
        "_type" : "type",
        "_id" : "X2xcf3ABW0y3WGzTST46",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 0,
        "_primary_term" : 1,
        "status" : 201
      }
    },
    {
      "index" : {
        "_index" : "index",
        "_type" : "type",
        "_id" : "YGxcf3ABW0y3WGzTST46",
        "_version" : 1,
        "result" : "created",
        "_shards" : {
          "total" : 2,
          "successful" : 1,
          "failed" : 0
        },
        "_seq_no" : 1,
        "_primary_term" : 1,
        "status" : 201
      }
    }
  ]
}

3.4.1.結果確認

$ curl -H "Content-Type: application/json" -XGET 'localhost:9200/index/type/_search?pretty'

{
  "took" : 5,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : 4,
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "index",
        "_type" : "type",
        "_id" : "X2xcf3ABW0y3WGzTST46",
        "_score" : 1.0,
        "_source" : {
          "id" : "02",
          "name" : "hanako",
          "age" : "13",
          "address" : "kyoto"
        }
      },
      {
        "_index" : "index",
        "_type" : "type",
        "_id" : "YGxcf3ABW0y3WGzTST46",
        "_score" : 1.0,
        "_source" : {
          "id" : "03",
          "name" : "ichiro",
          "age" : "16",
          "address" : "osaka"
        }
      },
      {
        "_index" : "index",
        "_type" : "type",
        "_id" : "XWxcf3ABW0y3WGzTST46",
        "_score" : 1.0,
        "_source" : {
          "id" : "id",
          "name" : "name",
          "age" : "age",
          "address" : "address"
        }
      },
      {
        "_index" : "index",
        "_type" : "type",
        "_id" : "Xmxcf3ABW0y3WGzTST46",
        "_score" : 1.0,
        "_source" : {
          "id" : "01",
          "name" : "taro",
          "age" : "12",
          "address" : "tokyo"
        }
      }
    ]
  }
}

以上

10
7
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
10
7