More than 5 years have passed since last update.

csvファイルで特定カラムが偶数か奇数のデータだけを抽出する方法

Posted at 2019-10-28

csvファイルで出力されたuser_id, item_idのリストで、user_idの末尾が偶数or奇数のデータのみを抽出する方法のメモ。
awkコマンドを利用しているのでterminal上で実行できます。

サンプルデータ

ヘッダー有り無し、ダブルクォート(")有り無しのサンプルを使います。

ヘッダーあり

$ cat test_header.csv
user_id,item_id
100001,001
100002,003
100003,002
100004,001
100005,003
100006,001
100007,003
100008,002
100009,002
100010,003

ヘッダーなし

$ cat test_non_header.csv
100001,001
100002,003
100003,002
100004,001
100005,003
100006,001
100007,003
100008,002
100009,002
100010,003

ダブルクオートに囲まれている

$ cat test_non_header_quote.csv
"100001","001"
"100002","003"
"100003","002"
"100004","001"
"100005","003"
"100006","001"
"100007","003"
"100008","002"
"100009","002"
"100010","003"

データ抽出

対象ファイルが存在するディレクトリで作業します。

ヘッダーあり

$ cat test_header.csv | awk '
	BEGIN {FS=",";OFS=","}
	{
		if(NR == 1){print $0}
		else if($1 % 2 == 1 && NR > 1){
			print $0
		}
	}
' >> test_header_odd.csv

ヘッダーなし

$ cat test_non_header.csv | awk '
	BEGIN {FS=",";OFS=","}
	{
		if($1 % 2 == 1){
			print $0
		}
	}
' >> test_non_header_odd.csv

ダブルクオートに囲まれている

$ cat test_non_header_quote.csv | awk '
	BEGIN {FS=",";OFS=","}
	{
		if(substr($1,length($1)-1,1) % 2 == 1){
			print $0
		}
	}
' >> test_non_header_quote_odd.csv

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up