More than 5 years have passed since last update.

【python】100本ノックにチャレンジ!(015〜019)

Last updated at 2017-09-24Posted at 2017-08-25

これまでの経緯などについて

ノック状況

9/24追加

第2章: UNIXコマンドの基礎

hightemp.txtは，日本の最高気温の記録を「都道府県」「地点」「℃」「日」のタブ区切り形式で格納したファイルである．以下の処理を行うプログラムを作成し，hightemp.txtを入力ファイルとして実行せよ．さらに，同様の処理をUNIXコマンドでも実行し，プログラムの実行結果を確認せよ．

015. 末尾のN行を出力

自然数Nをコマンドライン引数などの手段で受け取り、入力のうち末尾のN行だけを表示せよ．確認にはtailコマンドを用いよ．

tail_015.py

# -*- coding:utf-8 -*-

import codecs
import subprocess

def tail(data,N):
    max = len(data)
    print(''.join(data[max-N:]))

if __name__=="__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    N=3
    tail(f.readlines(),N)

# tailコマンドで確認
    output=subprocess.check_output(["tail","-n",str(N),basepath+filename])
    print(output.decode('utf-8'))

result

山梨県	大月	39.9	1990-07-19
山形県	鶴岡	39.9	1978-08-03
愛知県	名古屋	39.9	1942-08-02

山梨県	大月	39.9	1990-07-19
山形県	鶴岡	39.9	1978-08-03
愛知県	名古屋	39.9	1942-08-02

感想：工夫ポイントはjoinを始める行の指定の仕方かな

016. ファイルをN分割する

自然数Nをコマンドライン引数などの手段で受け取り，入力のファイルを行単位でN分割せよ．同様の処理をsplitコマンドで実現せよ

split_016.py

-*- coding:utf-8 -*-

import codecs
import subprocess
import math

def split(data,N):
    index=0
# 書き出すファイル数を計算する
    page=math.ceil(len(data)/N)
    for i in range(0,page):
# リストを文字列にして書き出すデータをwrite_dataに追記
        write_data=''.join(data[index:N+index])
        index+=N
        f=codecs.open('write_data'+str(index),'w','utf-8')
        f.write(write_data)

if __name__ == "__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    N = 15
    f=codecs.open(filename,'r','utf-8')
    split(f.readlines(),N)
    output=subprocess.check_output(["split","-l",str(N),basepath+filename])

result

split関数によりwrite_data15とwrite_data30のファイルが出力された
write_data15
高知県	江川崎	41	2013-08-12
埼玉県	熊谷	40.9	2007-08-16
岐阜県	多治見	40.9	2007-08-16
(結果が長いので略)
write_data30
山形県	酒田	40.1	1978-08-03
岐阜県	美濃	40	2007-08-16
群馬県	前橋	40	2001-07-24
(結果が長いので略)

splitコマンドによりxaaとxabのファイルが出力された
xaa
高知県	江川崎	41	2013-08-12
埼玉県	熊谷	40.9	2007-08-16
岐阜県	多治見	40.9	2007-08-16
(結果が長いので略)
xab
山形県	酒田	40.1	1978-08-03
岐阜県	美濃	40	2007-08-16
群馬県	前橋	40	2001-07-24
(結果が長いので略)

Process finished with exit code 0

感想：split関数を作成するときに悩んだのはN数行で1ファイルを作成するときのページ数の計算の仕方とファイル名の命名の仕方。。。

017. １列目の文字列の異なり

1列目の文字列の種類（異なる文字列の集合）を求めよ．確認にはsort, uniqコマンドを用いよ．

sort_uniq_017.py

# -*- coding:utf-8 -*-

import codecs
import subprocess

def sort_uniq(data):
    cut_temp = []
    sort_temp = []
    uniq_temp = []

# cut -f 1の働き
    for temp in data:
        cut_temp.append(temp.split()[:1])

# sortの働き
    sort_temp = sorted(cut_temp)

# uniqの働き
    for temp in sort_temp:
        if temp not in uniq_temp:
            uniq_temp.append(temp)

# listをstrに変換後、余分な文字を削除して表示
    sort_uniq_data = map(str,uniq_temp)
    for temp in sort_uniq_data:
        print(''.join(temp).lstrip("['").rstrip("']"))

if __name__ == "__main__":
    filename = 'hightemp.txt'
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    f = codecs.open(filename,'r','utf-8')
    sort_uniq(f.readlines())
    print('\n')

    cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
    sort = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
    uniq = subprocess.Popen(["uniq"],stdin=sort.stdout,stdout=subprocess.PIPE)
    end_of_pipe = uniq.stdout
    for line in end_of_pipe:
        print(line.decode('utf-8').rstrip('\n'))

result

千葉県
和歌山県
埼玉県
大阪府
(結果が長いので略)


千葉県
和歌山県
埼玉県
大阪府
山形県
(結果が長いので略)
Process finished with exit code 0

感想：subprocessモジュールを利用したパイプの書き方がわからなかった。。。linuxなら|だけで済むが、プログラムにするとどんなパラメーターが必要かよく分かる。

018.ソート

各行を3コラム目の数値の降順にソート各行を3コラム目の数値の逆順で整列せよ（注意: 各行の内容は変更せずに並び替えよ）．確認にはsortコマンドを用いよ（この問題はコマンドで実行した時の結果と合わなくてもよい）．

r_sort_018.py

# -*- conding:utf-8 -*-
import codecs
import subprocess
import operator

def r_sort(data):
    cut_temp = []
    sort_temp = []

# リスト化
    for temp in data:
        cut_temp.append(temp.split())

# sortの働き
    sort_temp = sorted(cut_temp,key=operator.itemgetter(2),reverse=True)

# listをstrに変換後、余分な文字を削除して表示
    sort_data = map(str, sort_temp)
    for temp in sort_data:
        print(''.join(temp).lstrip("['").rstrip("']"))

if __name__=="__main__":
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    filename = 'hightemp.txt'
    with codecs.open(filename,'r','utf-8') as f:
        r_sort(f.readlines())
    print('\n')

    sort= subprocess.check_output(["sort","-r","-k","3",basepath+filename])
    print(sort.decode('utf-8'))

result

高知県', '江川崎', '41', '2013-08-12
埼玉県', '熊谷', '40.9', '2007-08-16
岐阜県', '多治見', '40.9', '2007-08-16
(結果が長いので略)

高知県	江川崎	41	2013-08-12
岐阜県	多治見	40.9	2007-08-16
埼玉県	熊谷	40.9	2007-08-16
(結果が長いので略)
Process finished with exit code 0

感想:operatorモジュールのitemgetter関数が便利だった。

019. 各行の1コラム目の文字列の出現頻度を求め，出現頻度の高い順に並べる

各行の1列目の文字列の出現頻度を求め，その高い順に並べて表示せよ．確認にはcut, uniq, sortコマンドを用いよ

frequency_019.py

# -*- coding:utf-8 -*-
import codecs
import subprocess
import collections
import operator

def frequency(data):
    cut_temp = []
    sort_temp = []
    count_dict={}

    # cut -f 1の働き
    for temp in data:
        cut_temp.append(temp.split()[:1])

    # sortの働き
    sort_temp = sorted(cut_temp)

    #listの中の要素数をカウント
    # uniq -c+sortの働き
    count_dict = collections.Counter(map(str,sort_temp))
    for value,count in sorted(count_dict.items(),key=operator.itemgetter(1),reverse=True):
        print(count,str(value).lstrip("['").rstrip("']"))

if __name__=="__main__":
    basepath = '/Users/masassy/PycharmProjects/Pywork/training/'
    filename = 'hightemp.txt'
    with codecs.open(filename,'r','utf-8') as f:
        frequency(f.readlines())

    print('\n')
    cut=subprocess.Popen(["cut","-f","1",basepath+filename],stdout=subprocess.PIPE)
    sort1 = subprocess.Popen(["sort"],stdin=cut.stdout,stdout=subprocess.PIPE)
    uniq = subprocess.Popen(["uniq","-c"],stdin=sort1.stdout,stdout=subprocess.PIPE)
    sort2 = subprocess.Popen(["sort","-r"],stdin=uniq.stdout,stdout=subprocess.PIPE)
    end_of_pipe = sort2.stdout
    for line in end_of_pipe:
        print(line.decode('utf-8').lstrip(' ').rstrip('\n'))

result

3 山梨県
3 山形県
3 群馬県
3 埼玉県
2 岐阜県
2 千葉県
(結果が長いので略)

3 群馬県
3 山梨県
3 山形県
3 埼玉県
2 静岡県
2 愛知県
(結果が長いので略)
Process finished with exit code 0

感想：辞書型の取扱とソートが難しかった。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up