Python初心者のメモ #Python

参考記事（網羅的・体系的なまとめ）

pandas

pandas.DataFrame

pandas.DataFrame.sample

DataFrameから行をランダムサンプリングする、デフォルトは１行
第一引数で行数を指定可能

pandas.Series

pandas.Series.value_counts

ユニークな要素の値とその出現回数をpandas.Seriesで返す

vc = df['state'].value_counts()
print(vc)
print(type(vc))
# NY    2
# CA    2
# TX    1
# Name: state, dtype: int64
# <class 'pandas.core.series.Series'>

str

str.replace

文字列を置換する

s = 'one two one two one'
print(s.replace(' ', '-'))
# one-two-one-two-one

複数の文字列を置換する際は、replaceを繰り返し使う
呼び出した順（左から）に置換していくので、順番に注意

s = 'one two one two one'
print(s.replace('one', 'XXX').replace('two', 'YYY'))
# XXX YYY XXX YYY XXX

sliceによる置換

strには位置を指定して置換するメソッドは存在しない
だがsliceで分割して任意の文字列と連結することで、結果的に同じ操作ができる

s = 'abcdefghij'
print(s[:4] + 'XXX' + s[7:])
# abcdXXXhij

str.join

文字列を結合する、引数は区切り文字

test = ['ab', 'c', 'de']
result = ''.join(test)
print(result)
# abcde4

str.split

文字列を分割する
戻り値はリストとなる

str.ljust, str.rjust, str.center

文字列を指定した文字数で左寄せ、右寄せ、中央寄せする
新たに生成された文字列はデフォルトでは空白で埋められる
第二引数を指定すると、その文字で埋められる

s = 'abc'
s_rjust = s.rjust(8)
print(s_rjust)
#      abc
print(s.center(8, '+'))
# ++abc+++

all

イテラブルオブジェクトについて、全要素が真かを判定
Set型等にも有効

print(all([True, True, True]))
# True

any

イテラブルオブジェクトについて、いずれかの要素が真かを判定
Set型等にも有効

print(any([True, False, False]))
# True

find, rfind

任意の文字列の位置を取得

s = 'I am Sam'
print(s.find('Sam'))
# 5
print(s.find('XXX'))
# -1

index, rindexも同じ機能を持つが、文字列がない場合にエラーを出力する

list

リストのスライスにおいて範囲を全指定にしてマイナスステップにすると、要素がリバースする

a = [1, 2, 3, 4, 5, 6, 7, 8, 9]
print(a[::-1])
# [9, 8, 7, 6, 5, 4, 3, 2, 1]

リストを*をつけて出力すると"[ ]"が外される
さらにsep=""で区切り文字を指定できる

内包表記+条件式

if ... のみを使うとき：[ ... for ... in ... if ... ]
if ... else ... を使うとき：[ ... if ... else ... for ... in ... ]

内包表記+ネスト

外側から評価される

List=[1*i + 10*j + 100*k for k in range(2) for j in range(3) for i in range(4)]
print(List)
# [0, 1, 2, 3, 10, 11, 12, 13, 20, 21, 22, 23, 100, 101, 102, 103, 110, 111, 112, 113, 120, 121, 122, 123]

flatten

init=[[1,2,3],[4,5],[6,7]]
[inner for outer in init for inner in outer]
# [1, 2, 3, 4, 5, 6, 7]

要素の削除

list.pop

list.pop(index)とメソッドとして使う、返り値が指定要素

List=["a","b","c","d","e","f"]
print(List.pop(1))
# b
print(List)
# ['a', 'c', 'd', 'e', 'f']

list.remove

list.remove(name)とメソッドとして使う

List=["a","b","c","d","e","f"]
List.remove('d')
print(List)
# ['a', 'b', 'c', 'e', 'f']

del

del list[index]と構文として使う

List=["a","b","c","d","e","f"]
del List[1]
print(List)
# ['a', 'c', 'd', 'e', 'f']

list.insert

リストへの挿入

List=["a","b","c"]
List.insert(1,'z')
print(List)
# ['a', 'z', 'b', 'c']

set

重複する値がある場合は無視されて、一意な値のみが要素として残る

l = [2, 2, 3, 1, 3, 4]
l_unique = list(set(l))
print(l_unique)
# [1, 2, 3, 4]

bit演算もできる

s1 = set([1, 2, 3])
s2 = set([2, 3, 4])
print(s1 | s2)
# {1, 2, 3, 4}

集合演算
- 和集合: &
- 差集合: -
- 対称差集合: ^
setをsorted()でソートするとlistになる
len()で要素数もlistと同じように取得できる

ord

アスキーコードを取得

ord("a")
# 74

chr

アスキーコードから文字に

chr(97)
# a

print

printの条件式でelifが続いた場合、下記のように書ける

a,b,c,d = map(int, input().split())
print('TAKAHASHI' if b/a > d/c else 'AOKI' if b/a < d/c else 'DRAW')

float

float.is_integer

float型の数値が整数（小数点以下が0）か判定

f_i = 100.0
print(f_i.is_integer())
# True

numpy

numpy.rehape

2次元配列等に変換する際に縦方向に並べる

mask[:, :, idx] = mask_label.reshape(256, 1600, order='F')

numpy.nditer

ndarrayの次元が多くなっても２重ループせずに展開できる

numpy.where

第一引数の条件式を満たすindexをndarrayで返す

a = np.arange(20, 0, -2)
print(a)
# [20, 18, 16, 14, 12, 10,  8,  6,  4,  2]

print(np.where(a < 10))
# [6, 7, 8, 9]

bisect

二分探索、ソート済みリストからのindexの探索を行う

A = [1, 2, 3, 3, 3, 4, 4, 6, 6, 6, 6]
print(A)
index = bisect.bisect_left(A, 3)
# 2 （最も左(前)の挿入箇所が返ってきている）

eval

引数（文字列）を式として評価する

print(eval('1 + 2'))
# 3

reduce

リストの要素を足し合わせたりかけ合わせたりする（畳み込み演算）

a = [-1, 3, -5, 7, -9]
print reduce(lambda x, y: x + y, a)
# -5

map, filter

mapやfilterで生成したオブジェクトはlen()できない、イタレータなので

tqdm.tqdm

渡されたイタレータを読み取って全体の処理と現在の進捗をプログレスバーとして表示

from tqdm import tqdm
for _ in tqdm(range(100)):
  time.sleep(0.1)

Path.pathlib

パス関連全般を扱うUtility、標準

from pathlib import Path
train_path = Path("../input/train_images/")
for img_name in train_path.iterdir():
    img = Image.open(img_name)

seaborn

Matplotlibの機能をより美しく、またより簡単に実現するためのラッパー的存在

import seaborn as sns
sns.barplot(x=list(class_dict.keys()), y=list(class_dict.values()), ax=ax)

glob.glob

引数としてパターンを与えるとディレクトリ内でパターンにマッチするファイル名がリストで返される
パターンの書き方は、Unixシェルで使用される書き方と同じ

import glob
glob.glob('*.log')
# ['abc.log', 't_1.log', 't_2.log']
glob.glob(’t_*.log’)
# [t_1.log, t_2.log]