This article is a Private article. Only a writer and users who know the URL can access it.
Please change open range to public in publish setting if you want to share this article with other users.

More than 3 years have passed since last update.

[Python] numpy/scipyメモ

Last updated at 2021-03-28Posted at 2019-11-30

単語帳．毎回検索するのが面倒なので転載多め（元URLあり）．
以下import numpy as npを前提

numpy

関数

np.whereによる複数条件検索はnp.where(() & ())

`np.savetxt()`と`np.loadtxt()`

np.loadtxt(fname, dtype='float', comments='#', delimiter=None, converters=None, 
           skiprows=0, usecols=None, unpack=False, ndmin=0)
np.savetxt(fname, X, fmt='%.18e', delimiter=' ', newline='\n', header='', footer='', comments='#')

DeepAge: np.loadtxtとnp.savetxtでテキストファイルを読み書きする方法

np.maximum/minimumに3番目の引数を入れると，そこに上書きされる

新しく配列を作りたくないときは，3番目に1番目の引数（元の配列）を入れる．
元の配列を変えないことを"doesn't mutate"というらしい．

a = np.arange(1,5)
n = 3

np.maximum(a, n) # array([3, 3, 3, 4])
a # array([1, 2, 3, 4])
np.maximum(a, n, a) # array([3, 3, 3, 4])
a # array([3, 3, 3, 4])

[stackoverflow: Comparing elements of an array to a scalar and getting the max in Python]
(https://stackoverflow.com/questions/16587427/comparing-elements-of-an-array-to-a-scalar-and-getting-the-max-in-python)

カウント

使える場面はcount_nonzero (とzeroをカウントするならsize()も) を使うと高速．
例えばbooleanの配列では以下が高速．

np.count_nonzero(arr) # Trueの個数
arr.size - np.count_nonzero(arr) # Falseの個数

[stackoverflow: Most efficient (and pythonic) way to count False values in 2D numpy arrays?]
(https://stackoverflow.com/questions/36470410/most-efficient-and-pythonic-way-to-count-false-values-in-2d-numpy-arrays)

移動平均

畳み込み積分np.convolveを使う

num=5.0 #移動平均の個数
b=np.ones(int(num))/num
y2=np.convolve(y, b, mode='same')#移動平均

mode='same'で配列の大きさも変わらない

https://qiita.com/wrblue_mica34/items/51adf0059b61887075d9
https://deepage.net/features/numpy-convolve.html

インデックス・次元

配列の結合

np.concatenate: 元の配列の次元が使われる (=次元は増えない)
np.stack: 新たな次元が定義される (=次元が増える)

例えば水平2次元の配列を時間方向に重ねたいときは

nt = 12
nx, ny = 180, 360
# 最終目標はarr [nt,nx,ny]

# stack
arr = []
for itime in range(nt):
    _arr = np.ones((nx, ny))
    arr += [_arr]
arr = np.stack(arr, axis=0) # stack時に次元が増える

# concatenate
arr = []
for itime in range(nt):
    _arr = np.ones((nx, ny))[None,] # 結合用の次元を増やしておく
    arr += [_arr]
arr = np.concatenate(arr, axis=0)

[note.nkmk.me: NumPy配列ndarrayを結合（concatenate, stack, blockなど）]
(https://note.nkmk.me/python-numpy-concatenate-stack-block/)

Ellipsis `...`で次元を省略

[note.nkmk.me: NumPy配列ndarrayの次元をEllipsis（...）で省略して指定]
(https://note.nkmk.me/python-numpy-ellipsis/)

多次元配列のargmax/min

np.unravel_index(x.argmax(), x.shape)

mzrandom: Python: 多次元 array の argmax

次元の拡張

方法は2通り：np.newaxisorbroadcast_to

field3d_mask[:,:,:] = field2d[np.newaxis,:,:] > 0.3
field3d_mask = np.broadcast_to(field2d > 0.3, field3d.shape)

stackoverflow: Mask a 3d array with a 2d mask in numpy

np.ma

np.ma.masked_equalはstringには適用不可

ar = np.array(['a', 'b', 'c'])
mr = np.ma.masked_equal(ar, 'b') # AttributeError: 'NotImplementedType' object has no attribute 'ndim'

代わりにnp.ma.masked_arrayを使う

mr = np.ma.masked_array(ar, mask=ar=='b')
print mr # ['a' -- 'c']

MaskedArrayの演算，マスクされた部分には注意

a = np.ma.masked_less   (np.arange(1,5)   , 3.0) # [-- -- 3 4]
b = np.ma.masked_greater(np.arange(5,1,-1), 3.0) # [-- -- 3 2]

c = a + b
print(c)      # [-- -- 6 6]
print(c.data) # [1 2 6 6] ←[6,6,6,6]ではない

マスクされている部分に加算は適用されていないが，なぜかaの値が残っている

am1 = a.mean()
am2 = a[:2].mean()
print(am1, type(am1))           # 3.5 <class 'numpy.float64'>
print(am2, type(am2), am2.data) # -- <class 'numpy.ma.core.MaskedConstant'> 0.0

MaskedArrayを平均すると，（期待される通りだが）マスクされている部分の値は無視した平均値が得られる．
一方平均する部分の全てがマスクされていると，勝手に0.0が挿入される．

欠損のある配列はMaskedArrayの方がnp.nanより高速

n    = int(1e8)
arr  = np.random.rand(n)
marr = np.ma.masked_less(arr, 0.5) # marr: 欠損をマスク
narr = arr.copy()
narr[marr.mask] = np.nan           # narr: 欠損をnp.nanに

%timeit msum = marr.sum()       # 1.59 s ± 311 ms per loop
%timeit nsum = np.nansum(narr)  # 4.21 s ± 647 ms per loop
%timeit msum = marr.mean()      # 4.1 s ± 184 ms per loop
%timeit nsum = np.nanmean(narr) # 6.59 s ± 115 ms per loop

データタイプ

'int32'などの文字列からバイト数を取得

dtype_name = 'int32'
np.dtype(dtype_name).itemsize # 4

[Numpy: Data type objects (dtype)]
(https://numpy.org/doc/stable/reference/arrays.dtypes.html)

`<`がリトルエンディアン，`>`がビッグエンディアン

[stackoverflow: What do the > < signs in numpy dtype mean?]
(https://stackoverflow.com/questions/40589499/what-do-the-signs-in-numpy-dtype-mean)

乱数生成

np.random.uniform(low=0.0, high=1.0, size=None)

ある区間の浮動点小数を生成

SciPy.org: numpy.random.uniform

配列のファイル出入力

.npy, .npzファイルの扱い

# 1つの配列，拡張子は.npy
np.save(path, arr)
arr = np.load(path)

# 複数の配列，拡張子は.npz
np.savez(path, a=arr0, b=arr1, c=arr2) # 複数の配列を出力, 名前をつけないとarr_0, 1, ...
npz = np.load(path) # 複数の配列を入力
arr0 = npz['a']
arr1 = npz['b']
arr2 = npz['c']

[note.nkmk.me: NumPy配列ndarrayをバイナリファイル（npy, npz）で保存]
(https://note.nkmk.me/python-numpy-load-save-savez-npy-npz/)

np.ma.MaskedArrayは (1つの配列でも) np.saveには未対応なので，
.npz経由で出入力すればOK．
(下記事ではnp.dumpによるpickle出力が推奨されているが，下項の通りnpyの方が容量・速度の点で良い)

[stackoverflow: How to save numpy masked array to file]
(https://stackoverflow.com/questions/13877063/how-to-save-numpy-masked-array-to-file)

saveか？dumpか？

"numpy save vs dump"等で検索．

np.dumpはpickleのラッパー
pickle形式よりnpy形式の方が容量・速度どちらの点でも有利．

[stackoverflow: best way to preserve numpy arrays on disk]
(https://stackoverflow.com/questions/9619199/best-way-to-preserve-numpy-arrays-on-disk)
[Qiita@daenqiita: pickle vs npy]
(https://qiita.com/daenqiita/items/d86dee829ded639f929b)

`savetxt`, `loadtxt`

1つの値のみのファイル

オプションを何も指定しないと0次元 (スカラー) になる．
次元数はndmin=オプションで指定可．

test.txt

1.0

path = './test.txt'
arr = np.loadtxt(path)
print(arr, arr.shape) # 1.0 ()
arr = np.loadtxt(path, ndmin=1)
print(arr, arr.shape) # [1.] (1,)
arr = np.loadtxt(path, ndmin=2)
print(arr, arr.shape) # [[1.]] (1, 1)

[stackoverflow: numpy loadtext returns 0-d array for single line files, annoying?]
(https://stackoverflow.com/questions/22022092/numpy-loadtext-returns-0-d-array-for-single-line-files-annoying)

その他

`np.empty(shape)`: 空の配列を作る

例えば全ての値がNaNである配列を作るには

a = numpy.empty((3,3,))
a[:] = numpy.nan # a.fill( np.nan )の方が若干速いが，a[:]=の方が可読性高い？

stackoverflow: Create numpy matrix filled with NaNs

配列のインデックスを動的に指定する

[Qiita@aisha: [Python numpy] 配列のインデックスを動的に指定する]
(https://qiita.com/aisha/items/a70ada9ca89481f35cfe)

単純なダウンスケールにはrepeat

arr = np.random.rand(3, 4)
print(arr.shape)

factors = [2, 4]
_arr = arr.copy()
for iaxis, factor in enumerate(factors):
    _arr = np.repeat(_arr, factor, axis=iaxis)
print(_arr.shape)

plt.subplot(2,1,1)
plt.imshow(arr)
plt.subplot(2,1,2)
plt.imshow(_arr)

配列のユニークを取り出すにはnp.unique(ndarray)

返り値はndarray
ndarrayのメソッドにはない

arr = np.zeros((3,4))
arr[1] = 1
arr[2] = 2

un = np.unique(arr)
print(type(un), un) # <class 'numpy.ndarray'> [0. 1. 2.]

# 以下は動かない
set(arr) # TypeError: unhashable type: 'numpy.ndarray'

# 1次元に変換→リスト可なら動きはするが…
set(list(arr.reshape(-1))) # {0.0, 1.0, 2.0}, set型で返る

`np.logical_or/and`を3つ以上の配列に適用

.reduce()を使う

arr0 = np.array([1,1,1,0,0]).astype('bool')
arr1 = np.array([0,1,1,1,0]).astype('bool')
arr2 = np.array([0,0,1,1,1]).astype('bool')

print(np.logical_or. reduce((arr0, arr1, arr2))) # [ True  True  True  True  True]
print(np.logical_and.reduce((arr0, arr1, arr2))) # [False False  True False False]

arr = np.stack([arr0, arr1, arr2], axis=0)
print(arr, arr.shape)
print(np.logical_or. reduce(arr, axis=0)) # [ True  True  True  True  True]
print(np.logical_and.reduce(arr, axis=0)) # [False False  True False False]

[stackoverflow: Numpy logical_or for more than two arguments]
(https://stackoverflow.com/questions/20528328/numpy-logical-or-for-more-than-two-arguments)

scipy

`interpolate.interp1d`: 1次元データの補完

使い方:
[Qiita@ground0state: Scipyのinterpolateで欠損しているデータを補間する]
(https://qiita.com/ground0state/items/5fa0743837f1bcb374ca)

メモ:

extrapolationが必要なのにfill_value='extrapolate'を指定しない→エラー

2次元データのアップスケール/ダウンスケール

[Qiita@aisha: [Python scipy] 2次元データのアップスケール/ダウンスケール]
(https://qiita.com/aisha/items/e4f9dafdb6391f3c81ef)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up