言語処理100本ノック第2章解いてみた

Python

Last updated at 2024-08-03Posted at 2024-08-03

はじめに

こちらを解いていきます
https://nlp100.github.io/ja/ch02.html
前回第1章を解きました
https://qiita.com/obenkyo_hachi/items/f587d84758d6d7926eda
UNIXコマンドについては以前適当に作ったUbuntu環境があったので、そこで試しました
19のUNIXコマンドの結果は大丈夫そうですが、書き方は微妙な気がしています
引き続き頑張ります

10. 行数のカウント

Python

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.readlines()
  print(len(lines))

UNIX

wc -l popular-names.txt

11. タブをスペースに置換

Python

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
  for line in lines:
    print(line.replace("\t", " "))

UNIX

sed 's/\t/ /g' popular-names.txt

12. 1列目をcol1.txtに，2列目をcol2.txtに保存

Python

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
with open(f"{dirpath}/col1.txt", "w") as f:
  for line in lines:
    col1 = line.split("\t")[0]
    f.write(f"{col1}\n")
with open(f"{dirpath}/col2.txt", "w") as f:
  for line in lines:
    col2 = line.split("\t")[1]
    f.write(f"{col2}\n")

UNIX

cut -f 1 popular-names.txt > col1.txt
cut -f 2 popular-names.txt > col2.txt

13. col1.txtとcol2.txtをマージ

Python

with open(f"{dirpath}/merge.txt", "w") as f0:
  with open(f"{dirpath}/col1.txt", "r") as f1:
    with open(f"{dirpath}/col2.txt", "r") as f2:
      lines1 = f1.read().splitlines()
      lines2 = f2.read().splitlines()
      for c1, c2 in zip(lines1, lines2):
        f0.write(f"{c1}\t{c2}\n")

UNIX

paste col1.txt col2.txt > merge.txt

14. 先頭からN行を出力

Python

n = int(input())

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
for line in lines[:n]:
  print(f"{line}")

UNIX

read num
head -n ${num} popoular-names.txt

15. 末尾のN行を出力

Python

n = int(input())

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
for line in lines[len(lines)-n:]:
  print(f"{line}")

UNIX

read num
tail -n ${num} popoular-names.txt

16. ファイルをN分割する

Python

n = int(input())
with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
lines_per_output = len(lines) // n
start, stop = 0, 0
for i in range(n):
  with open(f"{dirpath}/splited_files/split{i}.txt", "w") as fout:
    start = stop
    if i < len(lines) % n:
      stop = start + lines_per_output + 1
    else:
      stop = start + lines_per_output
    for line in lines[start:stop]:
      fout.write(f"{line}\n")

UNIX

read num
split -n ${num} -d -additional-suffix=.txt popular-names.txt ./splited_files/split

17. １列目の文字列の異なり

Python

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
S = set()
for line in lines:
  S.add(line.split("\t")[0])
print(S)
len(S)

UNIX

sort popular-names.txt | cut -f 1 | uniq

18. 各行を3コラム目の数値の降順にソート

Python

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
l = []
for line in lines:
  l.append(line.split("\t"))
result = sorted(l, key=lambda x: x[2], reverse=True)
for line in result:
  print(line)

UNIX

sort -n -r -k 3 popular-names.txt

19. 各行の1コラム目の文字列の出現頻度を求め，出現頻度の高い順に並べる

Python

import collections

with open(f"{dirpath}/popular-names.txt", "r") as f:
  lines = f.read().splitlines()
l = []
for line in lines:
  l.append(line.split("\t")[0])
c = collections.Counter(l)
result = sorted(c.items(), key=lambda x: x[1], reverse=True)
for count in result:
  print(count[0])

UNIX

cut -f 1 popular-names.txt | sort | uniq -c | sort -nr | awk '{print $2}'

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

言語処理100本ノック 第2章 解いてみた

はじめに

10. 行数のカウント

11. タブをスペースに置換

12. 1列目をcol1.txtに，2列目をcol2.txtに保存

13. col1.txtとcol2.txtをマージ

14. 先頭からN行を出力

15. 末尾のN行を出力

16. ファイルをN分割する

17. １列目の文字列の異なり

18. 各行を3コラム目の数値の降順にソート

19. 各行の1コラム目の文字列の出現頻度を求め，出現頻度の高い順に並べる

言語処理100本ノック第2章解いてみた