NumPy Basics: Arrays and Vectorized Computation

ndarray

NumPyが提供するn次元配列オブジェクト

Creating dnarrays

#配列から作成
data1 = [6, 7.5, 8, 9]
arr1 = np.array(data1)

#多次元配列でも作成できる
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

#pythonのrange関数
np.arange(10)

#零ベクトル
np.zeros(10)

#零行列
np.zeros((3, 6))

#初期化しないで生成
np.empty((2, 3, 2))

#次元の確認
arr2.ndim

#配列の形
arr2.shap

#データ型の確認
arr1.dtype

#データ型を指定して生成
arr1 = np.array([1, 2, 3], dtype=np.float64)

#文字列から生成
data3 = ["1.1", "2.2", "3.3"]
arr3 = np.array(data3, dtype=float64)

Operations between Arrays and Scalars

#配列同士の計算は同じ場所同士の計算
arr = np.array([[1, 2, 3], [4, 5, 6]])
"""
In [32]: arr
Out[32]: 
array([[1, 2, 3],
       [4, 5, 6]])
"""
arr * arr
"""
In [33]: arr * arr
Out[33]: 
array([[ 1,  4,  9],
       [16, 25, 36]])
"""

#スカラーとの計算は全要素に対して計算される
arr - 1
"""
In [34]: arr - 1
Out[34]: 
array([[0, 1, 2],
       [3, 4, 5]])
"""
1 / arr
"""
In [35]: 1 / arr
Out[35]: 
array([[1, 0, 0],
       [0, 0, 0]])
"""

Basic Indexing and Slicing / Fancy Indexing

\	0	1	2
0	0,0	0,1	0,2
1	1,0	1,1	1,2
2	2,0	2,1	2,2

要素の指定は数学の行列と同じで(row, col)

配列スライスのコピーが欲しかったらcopyしないと元の配列が変わった時にスライスも変わってしまう
arr[5:8].copy()

Boolean Indexing

bool配列を使って配列のマスキングができる

name = np.array(["bob", "martin" ,"feed","max","rosetta","john"])
"""
In [63]: name == "bob"
Out[63]: array([ True, False, False, False, False, False], dtype=bool)
"""
arr = np.arange(6)
"""
In [68]: arr[name=="rosetta"]
Out[68]: array([4])
"""

ブール演算子
& (and)
| (or)

mask = (name=="rosetta") | (name=="martin")
"""
In [72]: mask
Out[72]: array([False,  True, False, False,  True, False], dtype=bool)
"""

比較演算子による選択

data = randn(10)
"""
In [78]: data
Out[78]: 
array([-0.43930899, -0.18084457,  0.50384496,  0.34177923,  0.4786331 ,
        0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])
"""
data[data < 0] = 0
"""
In [80]: data
Out[80]: 
array([ 0.        ,  0.        ,  0.50384496,  0.34177923,  0.4786331 ,
        0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])
"""

Transposing Arrays and Swapping Axes

！！！！！難しい！！！！！
ファンシースライスで欲しいとこだけ取ったほうが楽な気がする…

arr = np.arange(15).reshape((3,5))

#転置
arr.T

#内積
np.dot(arr.T, arr)

arr = np.arange(45).reshape((3,5,3))

#軸指定で変形
arr.transpose((1, 0, 2))

#軸の交換
arr.swapaxes(1, 2)

Universal Functions: Fast Element-wise Array Functions

1引数関数

elementwise（要素単位）で操作を行う関数。
np.func(x)でxの要素それぞれについて関数を適用する。

Function	Description
abs	絶対値
sqrt	x ** 0.5
square	x ** 2
exp	exp(x)
log, log10, log2	底をe, 10, 2でlog(x)
log1p	xがごく小さい時のlog(1+x)
sign	符号(1,0,-1)を返す
ceil	小数点以下切り上げ
floor	小数点以下切り捨て
rint	小数を最近の整数へ丸める
modf	小数を小数部と整数部へ分解
isnan, isinf, isfinite	NaN, 無限, 数値かのbool値を返す
logical_not	not xのbool値を返す

2引数関数

np.func(x1, x2)で用いる。

Function	Description
add, subtract, multiply, divide, power, mod	x1 (+, -, , /, *, %) x2
maximum, minimum	x1とx2の同じ位置の要素で(大きい, 小さい)方
copysign	x1 * (sign of x2)
greater, greater_equal, less, less_equal, equal, not_equal	x1 (>, >=, <, <=, ==, !=) x2
logical_and, logical_or, logical_xor	x1 (&, 丨, ^) x2

Data Processing Using Arrays

二次元データを可視化する。
例としてsqrt(x^2, y^2)を計算したグリッドを表示する。

#1000個の点を作成
points = np.arange(-5, 5, 0.01)
#2次元メッシュを作成
#xは行にxの配列を、yは列にyの配列を持つ2次元配列
xs, ys = np.meshgrid(points, points)
#計算
z = np.sqrt(xs ** 2 + ys ** 2)
#表示
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

Expressing Conditional Logic as Array Operations

np.whereは第1引数の値によって第2引数または第3引数のどちらかを返す関数。
すなわち、np.where(cond, xarr, yarr) = [(x if c else y) for x, y, c in zip(xarr, yarr, cond)]

arr = randn(5, 5)
"""
In [5]: arr
Out[5]: 
array([[-0.63774199, -0.76558645, -0.46003378,  0.61095653,  0.78277454],
       [ 0.25332127,  0.50226145, -1.45706102,  1.14315867,  0.28015   ],
       [-0.76326506,  0.33218657, -0.18509161, -0.3410194 , -0.29194451],
       [-0.32247669, -0.64285987, -0.61059921, -0.38261289,  0.41530912],
       [-1.7341384 ,  1.39960857,  0.78411537,  0.25922757, -0.22972615]])
"""
arrtf = np.where(arr > 0, True, False)
"""
In [6]: arrtf
Out[6]: 
array([[False, False, False,  True,  True],
       [ True,  True, False,  True,  True],
       [False,  True, False, False, False],
       [False, False, False, False,  True],
       [False,  True,  True,  True, False]], dtype=bool)
"""

これを組み合わせれば複数の条件で分類をすることも可能。

cond1 = np.where(randn(10) > 0, True, False)
cond2 = np.where(randn(10) > 0, True, False)
"""
In [16]: cond1
Out[16]: array([False,  True, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [17]: cond2
Out[17]: array([False, False, False, False, False,  True, False,  True,  True,  True], dtype=bool)
"""
result = np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2, 3)))
"""
In [19]: result
Out[19]: array([3, 1, 3, 3, 1, 0, 1, 0, 0, 0])
"""

ifとelseの書き換えも可能。

result = []
for i in range(n):
    if cond1[i] and cond2[i]:
        result.append(0)
    elif cond1[i]:
        result.append(1)
    elif cond2[i]:
        result.append(2)
    else:
        result.append(3)

数式でも可能。（他のとは0と3が入れ替わっている事に注意）
result = 1*cond1 + 2*cond2

Mathematical and Statistical Methods

統計的な関数も用意されている。

arr = randn(5, 4)
arr.mean()
#軸の指定も可能
arr.mean(0)
arr.mean(1)
"""
In [60]: arr.mean()
Out[60]: 0.51585861805229682

In [62]: arr.mean(0)
Out[62]: array([ 0.65067115, -0.03856606,  1.06405353,  0.38727585])

In [63]: arr.mean(1)
Out[63]: array([ 1.18400902,  0.84203136,  0.50352006,  0.07445734, -0.0247247 ])
"""

sum
mean
std, var
min, max
argmin, argmax（最大・最小値のインデックスを返す）
cumsum（累進合計）
cumprod（累積合計）

Methods for Boolean Arrays

ブール型のTrueは1、Falseは0と数えられることから、sum関数によるカウントなどがよく用いられる。

arr = randn(100)
sumnum = (arr > 0).sum()
"""
In [75]: sumnum
Out[75]: 43
"""

他のブール関数は

any（一つでもTrueがあったらTrue）
all（全てTrueだったらTrue）

Sorting

ソートもできる。
arr.sort()

Unique and Other Set Logic

Python純正のset関数のようなものも使える。

unique(x)
intersect1d(x, y)（unique(x) & unique(y)）
union1d(x, y)（unique(x) | unique(y)）
in1d(x, y)（yの要素がxに含まれているかブール値の配列を返す）
setdiff1d(x, y)（yにないxの値）
setxor1d(x, y)（yにないxの値 & xにないyの値）

File Input and Output with Arrays

NumPyのarray objectを外部ファイルに保存する事ができる。
もちろん、保存したファイルを読み込んで復元することもできる。

arr = np.arange(10)

#バイナリ形式で保存
np.save("array_name", arr)
#バイナリ形式のファイルをロード
arr = np.load("array_name.npy")
#複数配列をzipで保存
np.savez("array_archive.npz", a=arr, b=arr)
#複数配列のzipをロード
arr_a = np.load("array_archive.npz")["a"]
arr_b = np.load("array_archive.npz")["b"]

#csv形式で保存
np.savetxt("array_ex.txt", arr, delimiter=",")
#csv形式のファイルを読み込み
arr = np.loadtxt("array_ex.txt", delimiter=",")

Linear Algebra

線形代数の計算もできる。

Function	Description
diag	対角要素を抜き出す
dot	内積
trace	対角要素の和
det	行列式
eig	固有値と固有ベクトルへ分解
inv	転置
pinv	ムーア-ペンローズの擬似逆行列
qr	QR分解
svd	SVD分解
solve	Aが正方行列のときAx=bのxを求める
stsq	最小二乗解を計算

Random Number Generation

高速に様々な分布のランダム値を得ることができる。

Function	Description
seed	シード値によるランダム生成
permutation	シーケンスの要素をランダムに並べ替える
shuffle	シーケンスの要素をランダムに並び替える
rand	引数で渡された次元数のランダム配列を生成
randint	引数で渡された次元数のランダム整数配列を生成
binomial	二項分布からランダムサンプリング
normal	正規分布からランダムサンプリング
beta	ベータ分布からランダムサンプリング
chisquare	chi-square分布からランダムサンプリング
gamma	ガンマ分布からランダムサンプリング
uniform	与えられた範囲の正規分布からランダムサンプリング

Example: Random Walks

以下をipythonで実行

nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()
plt.plot(walk)

Simulating Many Random Walks at Once

nwalks = 100
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps))
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1)
plt.plot(walks)

拡大

あんまり高品質なランダム値じゃないように見えるけど実際はメルセンヌツイスタを使っているから相当に高品質なはず…。

Python for Data Analysis Chapter 4