以前こんな記事書いたんですけど、
https://qiita.com/arc279/items/2515272a7050c0a13cbf
もうちょいイケてるやり方があったのでご査収ください。
コード
tsv2jsonl.jq
#!/usr/bin/env jq -n -R -f
input|split("\t") as $header
| inputs
| [$header, split("\t")]
| transpose
| map({(.[0]): .[1]})
| add
で、実行権限付けておく
bash
$ chmod +x tsv2jsonl.jq
これだけ。
使用例
bash
$ { seq -f 'c%g' 5; seq -f '%04g' inf; } | paste - - - - - | ./tsv2jsonl.jq -c | head -n 10
{"c1":"0001","c2":"0002","c3":"0003","c4":"0004","c5":"0005"}
{"c1":"0006","c2":"0007","c3":"0008","c4":"0009","c5":"0010"}
{"c1":"0011","c2":"0012","c3":"0013","c4":"0014","c5":"0015"}
{"c1":"0016","c2":"0017","c3":"0018","c4":"0019","c5":"0020"}
{"c1":"0021","c2":"0022","c3":"0023","c4":"0024","c5":"0025"}
{"c1":"0026","c2":"0027","c3":"0028","c4":"0029","c5":"0030"}
{"c1":"0031","c2":"0032","c3":"0033","c4":"0034","c5":"0035"}
{"c1":"0036","c2":"0037","c3":"0038","c4":"0039","c5":"0040"}
{"c1":"0041","c2":"0042","c3":"0043","c4":"0044","c5":"0045"}
{"c1":"0046","c2":"0047","c3":"0048","c4":"0049","c5":"0050"}
seq -f '%04g' inf;
ってやってる通り、1行ずつ処理してくのでメモリに優しい。
ポイント
-n
オプションつけると標準入力から読むのをやめる(null
が渡された扱いになる)
bash
$ jq -n .
null
で、
-
input
:標準入力から自力で1行読む(bash の read みたいな感じ) -
inputs
:標準入力から残りを全部読む(cat みたいな感じ)
を使って1行ずつ処理してく。っていう話。