はじめに
- こちらを解いていきます
https://nlp100.github.io/ja/ch02.html - 前回第1章を解きました
https://qiita.com/obenkyo_hachi/items/f587d84758d6d7926eda - UNIXコマンドについては以前適当に作ったUbuntu環境があったので、そこで試しました
- 19のUNIXコマンドの結果は大丈夫そうですが、書き方は微妙な気がしています
- 引き続き頑張ります
10. 行数のカウント
Python
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.readlines()
print(len(lines))
UNIX
wc -l popular-names.txt
11. タブをスペースに置換
Python
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
for line in lines:
print(line.replace("\t", " "))
UNIX
sed 's/\t/ /g' popular-names.txt
12. 1列目をcol1.txtに,2列目をcol2.txtに保存
Python
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
with open(f"{dirpath}/col1.txt", "w") as f:
for line in lines:
col1 = line.split("\t")[0]
f.write(f"{col1}\n")
with open(f"{dirpath}/col2.txt", "w") as f:
for line in lines:
col2 = line.split("\t")[1]
f.write(f"{col2}\n")
UNIX
cut -f 1 popular-names.txt > col1.txt
cut -f 2 popular-names.txt > col2.txt
13. col1.txtとcol2.txtをマージ
Python
with open(f"{dirpath}/merge.txt", "w") as f0:
with open(f"{dirpath}/col1.txt", "r") as f1:
with open(f"{dirpath}/col2.txt", "r") as f2:
lines1 = f1.read().splitlines()
lines2 = f2.read().splitlines()
for c1, c2 in zip(lines1, lines2):
f0.write(f"{c1}\t{c2}\n")
UNIX
paste col1.txt col2.txt > merge.txt
14. 先頭からN行を出力
Python
n = int(input())
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
for line in lines[:n]:
print(f"{line}")
UNIX
read num
head -n ${num} popoular-names.txt
15. 末尾のN行を出力
Python
n = int(input())
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
for line in lines[len(lines)-n:]:
print(f"{line}")
UNIX
read num
tail -n ${num} popoular-names.txt
16. ファイルをN分割する
Python
n = int(input())
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
lines_per_output = len(lines) // n
start, stop = 0, 0
for i in range(n):
with open(f"{dirpath}/splited_files/split{i}.txt", "w") as fout:
start = stop
if i < len(lines) % n:
stop = start + lines_per_output + 1
else:
stop = start + lines_per_output
for line in lines[start:stop]:
fout.write(f"{line}\n")
UNIX
read num
split -n ${num} -d -additional-suffix=.txt popular-names.txt ./splited_files/split
17. 1列目の文字列の異なり
Python
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
S = set()
for line in lines:
S.add(line.split("\t")[0])
print(S)
len(S)
UNIX
sort popular-names.txt | cut -f 1 | uniq
18. 各行を3コラム目の数値の降順にソート
Python
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
l = []
for line in lines:
l.append(line.split("\t"))
result = sorted(l, key=lambda x: x[2], reverse=True)
for line in result:
print(line)
UNIX
sort -n -r -k 3 popular-names.txt
19. 各行の1コラム目の文字列の出現頻度を求め,出現頻度の高い順に並べる
Python
import collections
with open(f"{dirpath}/popular-names.txt", "r") as f:
lines = f.read().splitlines()
l = []
for line in lines:
l.append(line.split("\t")[0])
c = collections.Counter(l)
result = sorted(c.items(), key=lambda x: x[1], reverse=True)
for count in result:
print(count[0])
UNIX
cut -f 1 popular-names.txt | sort | uniq -c | sort -nr | awk '{print $2}'