More than 5 years have passed since last update.

逆引き Numpy / Pandas (随時更新予定)

Last updated at 2017-01-25Posted at 2017-01-22

http://rest-term.com/archives/2999/
http://algorithm.joho.info/programming/python-numpy-sample-code/
に良いまとめがあるのでそれらを見れば事足りるのだが、記憶の定着のために自分用にもメモしておく。(諸事情により適当な英語も併記)

Numpy

配列を作る / Creating Array

1次元配列を作る / Make a one-dimensional array

>>> import numpy as np
>>> x = np.array([1, 2, 3])
>>> x
array([1, 2, 3])

2次元配列を作る / Make a two-dimensional array

>>> y = np.array([[1, 2, 3], [4, 5, 6]])
>>> y
array([[1, 2, 3],
       [4, 5, 6]])

配列のサイズを確認する / Confirm the size of an array

>>> y.shape
(2, 3)

下限値、上限値、スキップ間隔を指定して配列を作る / Make an array with the lower limit, upper limit, skip interval

>>> m = np.arange(0, 30, 2)
>>> m
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

下限値、上限値、個数を指定して配列を作る / Make an array with the lower limit, upper limit and elements count.

>>> np.linspace(1, 4, 9)
array([ 1.   ,  1.375,  1.75 ,  2.125,  2.5  ,  2.875,  3.25 ,  3.625,  4.   ])

配列の形を変える / Change the shape of array

>>> m = np.arange(0, 30, 2)
>>> m.reshape(3, 5)
array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])
>>> m
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

※ m自体は変わっていないことに注意

配列の形とサイズを変える / Change the shape and size of array

>>> m = np.arange(0, 30, 2)
>>> m.resize(3, 3)
>>> m
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

※ m自体が変わっていることに注意

形を指定して全ての要素が1である配列を作る / Make a two-dimensional array (all elements are 1) with the shape

>>> np.ones((4, 3))
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
>>>
>>> np.ones((2, 3), int)
array([[1, 1, 1],
       [1, 1, 1]])

形を指定して全ての要素が0である配列を作る / Make a two-dimensional array (all elements are 0) with the shape

>>> np.zeros((4, 3))
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

サイズを指定して単位行列的な2次元配列を作る / Make a two-dimensional array like an identity matrix with the size.

>>> np.eye(5)
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])

2次元配列の対角線要素を取得 / Get diagonal elements of a two-dimensional array

>>> np.diag([[ 1,  3,  5], [ 7,  9, 11], [13, 15, 17]])
array([ 1,  9, 17])

繰り返し同じ要素が登場する配列を作る / Make an array with repeating

>>> np.array([1, 2, 3] * 3)
array([1, 2, 3, 1, 2, 3, 1, 2, 3])

>>> np.repeat([1, 2, 3], 3)
array([1, 1, 1, 2, 2, 2, 3, 3, 3])

2つの配列を縦方向に結合する / Combine two arrays vertically

>>> x = np.array([[1, 2, 3]])
>>> y = np.array([[4, 5, 6], [7, 8, 9]])
>>> np.vstack([x, y])
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

2つの配列を横方向に結合する / Combine two arrays horizontally

>>> x = np.array([[1, 2], [3, 4]])
>>> y = np.array([[5, 6, 7], [8, 9, 0]])
>>> np.hstack([x, y])
array([[1, 2, 5, 6, 7],
       [3, 4, 8, 9, 0]])

乱数を使って配列を作成する / Make an array using random numbers

>>> np.random.randint(0, 10, (4, 3))
array([[6, 7, 8],
       [5, 4, 9],
       [5, 4, 9],
       [5, 9, 2]])
>>> np.random.randint(0, 10, (4, 3))
array([[5, 7, 5],
       [8, 4, 3],
       [2, 9, 6],
       [7, 9, 5]])

配列の操作 / Operating Array

配列の足し算 / Addition of arrays

>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [10, 11, 12]])
>>> y
array([[ 7,  8,  9],
       [10, 11, 12]])
>>> x + y
array([[ 8, 10, 12],
       [14, 16, 18]])
>>> x + x + y
array([[ 9, 12, 15],
       [18, 21, 24]])

配列の掛け算 / Multiplication of arrays

>>> x * y
array([[ 7, 16, 27],
       [40, 55, 72]])

配列の累乗 / Power of a array

>>> x ** 2
array([[ 1,  4,  9],
       [16, 25, 36]])
>>> x ** 3
array([[  1,   8,  27],
       [ 64, 125, 216]])

配列を行列として扱って内積を出す / Inner product of arrays

>>> x.dot(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
>>>
>>> z = np.array([[1], [2], [3]])
>>> z
array([[1],
       [2],
       [3]])
>>> x.dot(z)
array([[14],
       [32]])

※ 当然、内積を計算できるように縦横の数を揃えてやらないとエラーが出る。

配列の縦横変換 / Transpose an array

>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> x.T
array([[1, 4],
       [2, 5],
       [3, 6]])
>>> x.T.T
array([[1, 2, 3],
       [4, 5, 6]])
>>>
>>> z
array([[1],
       [2],
       [3]])
>>> z.T
array([[1, 2, 3]])

配列の要素の型を確認&変更する / Confirm and change the type of array elements

>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>>
>>> x.dtype
dtype('int64')
>>>
>>> x.astype('f')
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]], dtype=float32)

配列の最大値/最小値/合計/平均/標準偏差を求める / Calculate maximum, minimum, summation, average and standard deviation value of array elements

>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> x.max()
6
>>> np.max(x)
6
>>> x.min()
1
>>> np.min(x)
1
>>> x.sum()
21
>>> np.sum(x)
21
>>> x.mean()
3.5
>>> np.mean(x)
3.5
>>> np.average(x)
3.5
>>> x.std()
1.707825127659933
>>> np.std(x)
1.707825127659933

配列内の最大値/最小値のインデックスを求める / Get the index of maximum and minimum value in an array.

>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> x.argmax()
5
>>> x.argmin()
0
>>>
>>> y = np.array([[1, 2, 3], [1, 2, 3]])
>>> y
array([[1, 2, 3],
       [1, 2, 3]])
>>> y.argmax()
2
>>> y.argmin()
0

※ 最大値/最小値が複数ある場合は最初のインデックスを返す。

配列のインデックスとスライス / Indexing and Slicing

インデックスを指定して、配列から要素を抽出する / Extract elements from an array by index

>>> s = np.arange(13) ** 2
>>> s
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144])
>>> s[0]
0
>>> s[11]
121
>>> s[0:3]
array([0, 1, 4])
>>> s[0], s[11], s[0:3]
(0, 121, array([0, 1, 4]))
>>> s[-4:]
array([ 81, 100, 121, 144])
>>> s[-4:-1]
array([ 81, 100, 121])
>>> s[-4::-1]
array([81, 64, 49, 36, 25, 16,  9,  4,  1,  0])

インデックスを指定して、2次元配列から要素を抽出する / Extract elements from a two-dimensional array by index

>>> r = np.arange(36)
>>> r.resize((6, 6))
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
>>>
>>> r[2, 2]
14
>>> r[3, 3:6]
array([21, 22, 23])
>>> r[3, 3:7]
array([21, 22, 23])
>>> r[:2, :-1]
array([[ 0,  1,  2,  3,  4],
       [ 6,  7,  8,  9, 10]])
>>> r[:-1, ::2]
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22],
       [24, 26, 28]])

条件を指定して、2次元配列から要素を抽出/編集する / Extract and edit elements in a two-dimensional array by condition

>>> r[r > 30]
array([31, 32, 33, 34, 35])
>>> r[r > 20]
array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])
>>> r[r > 20] = 20
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 20, 20, 20],
       [20, 20, 20, 20, 20, 20],
       [20, 20, 20, 20, 20, 20]])

配列の参照渡しとコピー / Reference and copy of an array

配列の参照渡し / Reference of an array

>>> r = np.arange(36)
>>> r.resize((6, 6))
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
>>> 
>>> r2 = r[2:4, 2:4]
>>> r2
array([[14, 15],
       [20, 21]])
>>> 
>>> r2[:] = -1
>>> r2
array([[-1, -1],
       [-1, -1]])
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, -1, -1, 16, 17],
       [18, 19, -1, -1, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

※ r2 = r[2:4, 2:4]ではr2に参照を渡しているので、r2を編集するということは、rを編集するということを意味している。

配列のコピー / Copy of an array

>>> r = np.arange(36)
>>> r.resize((6, 6))
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
>>> 
>>> r2 = r[2:4, 2:4].copy()
>>> r2
array([[14, 15],
       [20, 21]])
>>> 
>>> r2[:] = -1
>>> r2
array([[-1, -1],
       [-1, -1]])
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

※ r2 = r[2:4, 2:4].copy()ではrからコピーされた新しい配列がr2に渡されているので、r2とrは別個のオブジェクト。r2を編集してもrに影響はない。

配列でイテレーション処理をする / Iterating over Arrays

配列をイテレートする / Iterate an array

>>> r = np.random.randint(0, 10, (4, 3))
>>> r
array([[1, 6, 3],
       [3, 6, 0],
       [4, 9, 3],
       [5, 9, 3]])
>>>
>>> for row in r:
...     print(row)
... 
[1 6 3]
[3 6 0]
[4 9 3]
[5 9 3]
>>>
>>> for i, row in enumerate(r):
...     print(i, ' : ', row)
... 
0  :  [1 6 3]
1  :  [3 6 0]
2  :  [4 9 3]
3  :  [5 9 3]

複数の配列を同時にイテレートする / Iterate multiple arrays in same time

>>> r
array([[1, 6, 3],
       [3, 6, 0],
       [4, 9, 3],
       [5, 9, 3]])
>>> r2 = r ** 2
>>> r2
array([[ 1, 36,  9],
       [ 9, 36,  0],
       [16, 81,  9],
       [25, 81,  9]])
>>> for x, y, z in zip(r, r2, r):
...     print(x, y, z)
... 
[1 6 3] [ 1 36  9] [1 6 3]
[3 6 0] [ 9 36  0] [3 6 0]
[4 9 3] [16 81  9] [4 9 3]
[5 9 3] [25 81  9] [5 9 3]

Pandas

Series

スカラー値のSeriesを順序有りカテゴリデータのSeriesに変換する / Convert a series from ratio scale to ordinal scale

>>> s = pd.Series([168, 180, 174, 190, 170, 185, 179, 181, 175, 169, 182, 177, 180, 171])
>>> 
>>> pd.cut(s, 3)
0     (167.978, 175.333]
1     (175.333, 182.667]
2     (167.978, 175.333]
3         (182.667, 190]
4     (167.978, 175.333]
5         (182.667, 190]
6     (175.333, 182.667]
7     (175.333, 182.667]
8     (167.978, 175.333]
9     (167.978, 175.333]
10    (175.333, 182.667]
11    (175.333, 182.667]
12    (175.333, 182.667]
13    (167.978, 175.333]
dtype: category
Categories (3, object): [(167.978, 175.333] < (175.333, 182.667] < (182.667, 190]]
>>> 
>>> pd.cut(s, 3, labels=['Small', 'Medium', 'Large'])
0      Small
1     Medium
2      Small
3      Large
4      Small
5      Large
6     Medium
7     Medium
8      Small
9      Small
10    Medium
11    Medium
12    Medium
13     Small
dtype: category
Categories (3, object): [Small < Medium < Large]

Dataframe

フィルタリング / Filtering

サンプルデータとしてAll-time Olympic Games medal tableを使用。

あるカラムの値が最大である行のラベルを取得する / Get a row label which column value is maximum

>>> df[df['Gold'] == max(df['Gold'])].index[0]
'United States'

Dataframe を複数条件でフィルタリングする / Filter a dataframe with multiple conditions

>>> df[(df['Gold'] > 0) & (df['Gold.1'] > 0)]

結合 / Merging

サンプルデータとして下記を使用する。 / Sample data is as follow:

>>> import pandas as pd
>>> staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'Director of HR'},
...                          {'Name': 'Sally', 'Role': 'Course liasion'},
...                          {'Name': 'James', 'Role': 'Grader'}])
>>> staff_df = staff_df.set_index('Name')
>>> student_df = pd.DataFrame([{'Name': 'James', 'School': 'Business'},
...                            {'Name': 'Mike', 'School': 'Law'},
...                            {'Name': 'Sally', 'School': 'Engineering'}])
>>> student_df = student_df.set_index('Name')
>>> 
>>> staff_df
                 Role
Name                 
Kelly  Director of HR
Sally  Course liasion
James          Grader
>>> 
>>> student_df
            School
Name              
James     Business
Mike           Law
Sally  Engineering

外部結合 / Outer merging

スタッフもしくは学生であるデータを取得する / Get data of who is student or staff

>>> pd.merge(staff_df, student_df, how='outer', left_index=True, right_index=True)
                 Role       School
Name                              
James          Grader     Business
Kelly  Director of HR          NaN
Mike              NaN          Law
Sally  Course liasion  Engineering

内部結合 / Inner merging

スタッフもしくは学生であるデータを取得する / Get data of who is student and staff

>>> pd.merge(staff_df, student_df, how='inner', left_index=True, right_index=True)
                 Role       School
Name                              
James          Grader     Business
Sally  Course liasion  Engineering

左外部結合 / Left merging

スタッフのデータを取得する。もし、そのスタッフが学生でもある場合は、Schoolデータも取得する。 / Get data of who is staff. If the staff is also student, get the data of school.

>>> pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
                 Role       School
Name                              
Kelly  Director of HR          NaN
Sally  Course liasion  Engineering
James          Grader     Business

右外部結合 / Right merging

学生のデータを取得する。もし、その学生がスタッフでもある場合は、Roleデータも取得する。 / Get data of who is student. If the student is also staff, get the data of role.

>>> pd.merge(staff_df, student_df, how='right', left_index=True, right_index=True)
                 Role       School
Name                              
James          Grader     Business
Mike              NaN          Law
Sally  Course liasion  Engineering

インデックス以外のカラムを使って結合する / Merging not using index

>>> products = pd.DataFrame([{'Product ID': 4109, 'Price': 5.0, 'Product': 'Suchi Roll'},
...                          {'Product ID': 1412, 'Price': 0.5, 'Product': 'Egg'},
...                          {'Product ID': 8931, 'Price': 1.5, 'Product': 'Bagel'}])
>>> products = products.set_index('Product ID')
>>> products
            Price     Product
Product ID                   
4109          5.0  Suchi Roll
1412          0.5         Egg
8931          1.5       Bagel
>>> invoices = pd.DataFrame([{'Customer': 'Ali', 'Product ID': 4109, 'Quantity': 1},
...                          {'Customer': 'Eric', 'Product ID': 1412, 'Quantity': 12},
...                          {'Customer': 'Anda', 'Product ID': 8931, 'Quantity': 6},
...                          {'Customer': 'Sam', 'Product ID': 4109, 'Quantity': 2}])
>>> invoices
  Customer  Product ID  Quantity
0      Ali        4109         1
1     Eric        1412        12
2     Anda        8931         6
3      Sam        4109         2
>>>
>>> pd.merge(products, invoices, how='right', left_index=True, right_on='Product ID')
   Price     Product Customer  Product ID  Quantity
0    5.0  Suchi Roll      Ali        4109         1
1    0.5         Egg     Eric        1412        12
2    1.5       Bagel     Anda        8931         6
3    5.0  Suchi Roll      Sam        4109         2

複数のカラムをキーとして結合する / Merging with multiple keys

>>> staff_df = pd.DataFrame([{'First Name': 'Kelly', 'Last Name': 'Desjardins', 'Role': 'Director of HR'},
...                          {'First Name': 'Sally', 'Last Name': 'Brooks', 'Role': 'Course liasion'},
...                          {'First Name': 'James', 'Last Name': 'Wilde', 'Role': 'Grader'}])
>>> student_df = pd.DataFrame([{'First Name': 'James', 'Last Name': 'Hammond', 'School': 'Business'},
...                            {'First Name': 'Mike', 'Last Name': 'Smith', 'School': 'Law'},
...                            {'First Name': 'Sally', 'Last Name': 'Brooks', 'School': 'Engineering'}])
>>> staff_df
  First Name   Last Name            Role
0      Kelly  Desjardins  Director of HR
1      Sally      Brooks  Course liasion
2      James       Wilde          Grader
>>> student_df
  First Name Last Name       School
0      James   Hammond     Business
1       Mike     Smith          Law
2      Sally    Brooks  Engineering
>>> pd.merge(staff_df, student_df, how='inner', left_on=['First Name','Last Name'], right_on=['First Name','Last Name'])
  First Name Last Name            Role       School
0      Sally    Brooks  Course liasion  Engineering

集約 / Grouping

カラムAで集約して、他のカラムの合計値を出す / Group by column 'A' and calculate sum of other columns

>>> df.groupby('A').agg('sum')

>>> df.groupby('A').agg({'B': sum})

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up