LoginSignup
33
51

More than 5 years have passed since last update.

逆引き Numpy / Pandas (随時更新予定)

Last updated at Posted at 2017-01-22

http://rest-term.com/archives/2999/
http://algorithm.joho.info/programming/python-numpy-sample-code/
に良いまとめがあるのでそれらを見れば事足りるのだが、記憶の定着のために自分用にもメモしておく。(諸事情により適当な英語も併記)


Numpy

配列を作る / Creating Array

:white_check_mark: 1次元配列を作る / Make a one-dimensional array
>>> import numpy as np
>>> x = np.array([1, 2, 3])
>>> x
array([1, 2, 3])
:white_check_mark: 2次元配列を作る / Make a two-dimensional array
>>> y = np.array([[1, 2, 3], [4, 5, 6]])
>>> y
array([[1, 2, 3],
       [4, 5, 6]])
:white_check_mark: 配列のサイズを確認する / Confirm the size of an array
>>> y.shape
(2, 3)
:white_check_mark: 下限値、上限値、スキップ間隔を指定して配列を作る / Make an array with the lower limit, upper limit, skip interval
>>> m = np.arange(0, 30, 2)
>>> m
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])
:white_check_mark: 下限値、上限値、個数を指定して配列を作る / Make an array with the lower limit, upper limit and elements count.
>>> np.linspace(1, 4, 9)
array([ 1.   ,  1.375,  1.75 ,  2.125,  2.5  ,  2.875,  3.25 ,  3.625,  4.   ])
:white_check_mark: 配列の形を変える / Change the shape of array
>>> m = np.arange(0, 30, 2)
>>> m.reshape(3, 5)
array([[ 0,  2,  4,  6,  8],
       [10, 12, 14, 16, 18],
       [20, 22, 24, 26, 28]])
>>> m
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28])

※ m自体は変わっていないことに注意

:white_check_mark: 配列の形とサイズを変える / Change the shape and size of array
>>> m = np.arange(0, 30, 2)
>>> m.resize(3, 3)
>>> m
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16]])

※ m自体が変わっていることに注意

:white_check_mark: 形を指定して全ての要素が1である配列を作る / Make a two-dimensional array (all elements are 1) with the shape
>>> np.ones((4, 3))
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
>>>
>>> np.ones((2, 3), int)
array([[1, 1, 1],
       [1, 1, 1]])
:white_check_mark: 形を指定して全ての要素が0である配列を作る / Make a two-dimensional array (all elements are 0) with the shape
>>> np.zeros((4, 3))
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])
:white_check_mark: サイズを指定して単位行列的な2次元配列を作る / Make a two-dimensional array like an identity matrix with the size.
>>> np.eye(5)
array([[ 1.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  1.]])
:white_check_mark: 2次元配列の対角線要素を取得 / Get diagonal elements of a two-dimensional array
>>> np.diag([[ 1,  3,  5], [ 7,  9, 11], [13, 15, 17]])
array([ 1,  9, 17])
:white_check_mark: 繰り返し同じ要素が登場する配列を作る / Make an array with repeating
>>> np.array([1, 2, 3] * 3)
array([1, 2, 3, 1, 2, 3, 1, 2, 3])
>>> np.repeat([1, 2, 3], 3)
array([1, 1, 1, 2, 2, 2, 3, 3, 3])
:white_check_mark: 2つの配列を縦方向に結合する / Combine two arrays vertically
>>> x = np.array([[1, 2, 3]])
>>> y = np.array([[4, 5, 6], [7, 8, 9]])
>>> np.vstack([x, y])
array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])
:white_check_mark: 2つの配列を横方向に結合する / Combine two arrays horizontally
>>> x = np.array([[1, 2], [3, 4]])
>>> y = np.array([[5, 6, 7], [8, 9, 0]])
>>> np.hstack([x, y])
array([[1, 2, 5, 6, 7],
       [3, 4, 8, 9, 0]])
:white_check_mark: 乱数を使って配列を作成する / Make an array using random numbers
>>> np.random.randint(0, 10, (4, 3))
array([[6, 7, 8],
       [5, 4, 9],
       [5, 4, 9],
       [5, 9, 2]])
>>> np.random.randint(0, 10, (4, 3))
array([[5, 7, 5],
       [8, 4, 3],
       [2, 9, 6],
       [7, 9, 5]])

配列の操作 / Operating Array

:white_check_mark: 配列の足し算 / Addition of arrays
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> y = np.array([[7, 8, 9], [10, 11, 12]])
>>> y
array([[ 7,  8,  9],
       [10, 11, 12]])
>>> x + y
array([[ 8, 10, 12],
       [14, 16, 18]])
>>> x + x + y
array([[ 9, 12, 15],
       [18, 21, 24]])
:white_check_mark: 配列の掛け算 / Multiplication of arrays
>>> x * y
array([[ 7, 16, 27],
       [40, 55, 72]])
:white_check_mark: 配列の累乗 / Power of a array
>>> x ** 2
array([[ 1,  4,  9],
       [16, 25, 36]])
>>> x ** 3
array([[  1,   8,  27],
       [ 64, 125, 216]])
:white_check_mark: 配列を行列として扱って内積を出す / Inner product of arrays
>>> x.dot(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: shapes (2,3) and (2,3) not aligned: 3 (dim 1) != 2 (dim 0)
>>>
>>> z = np.array([[1], [2], [3]])
>>> z
array([[1],
       [2],
       [3]])
>>> x.dot(z)
array([[14],
       [32]])

※ 当然、内積を計算できるように縦横の数を揃えてやらないとエラーが出る。

:white_check_mark: 配列の縦横変換 / Transpose an array
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> x.T
array([[1, 4],
       [2, 5],
       [3, 6]])
>>> x.T.T
array([[1, 2, 3],
       [4, 5, 6]])
>>>
>>> z
array([[1],
       [2],
       [3]])
>>> z.T
array([[1, 2, 3]])
:white_check_mark: 配列の要素の型を確認&変更する / Confirm and change the type of array elements
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>>
>>> x.dtype
dtype('int64')
>>>
>>> x.astype('f')
array([[ 1.,  2.,  3.],
       [ 4.,  5.,  6.]], dtype=float32)
:white_check_mark: 配列の最大値/最小値/合計/平均/標準偏差を求める / Calculate maximum, minimum, summation, average and standard deviation value of array elements
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> x.max()
6
>>> np.max(x)
6
>>> x.min()
1
>>> np.min(x)
1
>>> x.sum()
21
>>> np.sum(x)
21
>>> x.mean()
3.5
>>> np.mean(x)
3.5
>>> np.average(x)
3.5
>>> x.std()
1.707825127659933
>>> np.std(x)
1.707825127659933
:white_check_mark: 配列内の最大値/最小値のインデックスを求める / Get the index of maximum and minimum value in an array.
>>> x
array([[1, 2, 3],
       [4, 5, 6]])
>>> x.argmax()
5
>>> x.argmin()
0
>>>
>>> y = np.array([[1, 2, 3], [1, 2, 3]])
>>> y
array([[1, 2, 3],
       [1, 2, 3]])
>>> y.argmax()
2
>>> y.argmin()
0

※ 最大値/最小値が複数ある場合は最初のインデックスを返す。

配列のインデックスとスライス / Indexing and Slicing

:white_check_mark: インデックスを指定して、配列から要素を抽出する / Extract elements from an array by index
>>> s = np.arange(13) ** 2
>>> s
array([  0,   1,   4,   9,  16,  25,  36,  49,  64,  81, 100, 121, 144])
>>> s[0]
0
>>> s[11]
121
>>> s[0:3]
array([0, 1, 4])
>>> s[0], s[11], s[0:3]
(0, 121, array([0, 1, 4]))
>>> s[-4:]
array([ 81, 100, 121, 144])
>>> s[-4:-1]
array([ 81, 100, 121])
>>> s[-4::-1]
array([81, 64, 49, 36, 25, 16,  9,  4,  1,  0])
:white_check_mark: インデックスを指定して、2次元配列から要素を抽出する / Extract elements from a two-dimensional array by index
>>> r = np.arange(36)
>>> r.resize((6, 6))
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
>>>
>>> r[2, 2]
14
>>> r[3, 3:6]
array([21, 22, 23])
>>> r[3, 3:7]
array([21, 22, 23])
>>> r[:2, :-1]
array([[ 0,  1,  2,  3,  4],
       [ 6,  7,  8,  9, 10]])
>>> r[:-1, ::2]
array([[ 0,  2,  4],
       [ 6,  8, 10],
       [12, 14, 16],
       [18, 20, 22],
       [24, 26, 28]])
:white_check_mark: 条件を指定して、2次元配列から要素を抽出/編集する / Extract and edit elements in a two-dimensional array by condition
>>> r[r > 30]
array([31, 32, 33, 34, 35])
>>> r[r > 20]
array([21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35])
>>> r[r > 20] = 20
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 20, 20, 20],
       [20, 20, 20, 20, 20, 20],
       [20, 20, 20, 20, 20, 20]])

配列の参照渡しとコピー / Reference and copy of an array

:white_check_mark: 配列の参照渡し / Reference of an array
>>> r = np.arange(36)
>>> r.resize((6, 6))
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
>>> 
>>> r2 = r[2:4, 2:4]
>>> r2
array([[14, 15],
       [20, 21]])
>>> 
>>> r2[:] = -1
>>> r2
array([[-1, -1],
       [-1, -1]])
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, -1, -1, 16, 17],
       [18, 19, -1, -1, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

r2 = r[2:4, 2:4]ではr2に参照を渡しているので、r2を編集するということは、rを編集するということを意味している。

:white_check_mark: 配列のコピー / Copy of an array
>>> r = np.arange(36)
>>> r.resize((6, 6))
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])
>>> 
>>> r2 = r[2:4, 2:4].copy()
>>> r2
array([[14, 15],
       [20, 21]])
>>> 
>>> r2[:] = -1
>>> r2
array([[-1, -1],
       [-1, -1]])
>>> r
array([[ 0,  1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10, 11],
       [12, 13, 14, 15, 16, 17],
       [18, 19, 20, 21, 22, 23],
       [24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35]])

r2 = r[2:4, 2:4].copy()ではrからコピーされた新しい配列がr2に渡されているので、r2rは別個のオブジェクト。r2を編集してもrに影響はない。

配列でイテレーション処理をする / Iterating over Arrays

:white_check_mark: 配列をイテレートする / Iterate an array
>>> r = np.random.randint(0, 10, (4, 3))
>>> r
array([[1, 6, 3],
       [3, 6, 0],
       [4, 9, 3],
       [5, 9, 3]])
>>>
>>> for row in r:
...     print(row)
... 
[1 6 3]
[3 6 0]
[4 9 3]
[5 9 3]
>>>
>>> for i, row in enumerate(r):
...     print(i, ' : ', row)
... 
0  :  [1 6 3]
1  :  [3 6 0]
2  :  [4 9 3]
3  :  [5 9 3]
:white_check_mark: 複数の配列を同時にイテレートする / Iterate multiple arrays in same time
>>> r
array([[1, 6, 3],
       [3, 6, 0],
       [4, 9, 3],
       [5, 9, 3]])
>>> r2 = r ** 2
>>> r2
array([[ 1, 36,  9],
       [ 9, 36,  0],
       [16, 81,  9],
       [25, 81,  9]])
>>> for x, y, z in zip(r, r2, r):
...     print(x, y, z)
... 
[1 6 3] [ 1 36  9] [1 6 3]
[3 6 0] [ 9 36  0] [3 6 0]
[4 9 3] [16 81  9] [4 9 3]
[5 9 3] [25 81  9] [5 9 3]

Pandas

Series

:white_check_mark: スカラー値のSeriesを順序有りカテゴリデータのSeriesに変換する / Convert a series from ratio scale to ordinal scale
>>> s = pd.Series([168, 180, 174, 190, 170, 185, 179, 181, 175, 169, 182, 177, 180, 171])
>>> 
>>> pd.cut(s, 3)
0     (167.978, 175.333]
1     (175.333, 182.667]
2     (167.978, 175.333]
3         (182.667, 190]
4     (167.978, 175.333]
5         (182.667, 190]
6     (175.333, 182.667]
7     (175.333, 182.667]
8     (167.978, 175.333]
9     (167.978, 175.333]
10    (175.333, 182.667]
11    (175.333, 182.667]
12    (175.333, 182.667]
13    (167.978, 175.333]
dtype: category
Categories (3, object): [(167.978, 175.333] < (175.333, 182.667] < (182.667, 190]]
>>> 
>>> pd.cut(s, 3, labels=['Small', 'Medium', 'Large'])
0      Small
1     Medium
2      Small
3      Large
4      Small
5      Large
6     Medium
7     Medium
8      Small
9      Small
10    Medium
11    Medium
12    Medium
13     Small
dtype: category
Categories (3, object): [Small < Medium < Large]

Dataframe

フィルタリング / Filtering

サンプルデータとしてAll-time Olympic Games medal tableを使用。

:white_check_mark: あるカラムの値が最大である行のラベルを取得する / Get a row label which column value is maximum
>>> df[df['Gold'] == max(df['Gold'])].index[0]
'United States'
:white_check_mark: Dataframe を複数条件でフィルタリングする / Filter a dataframe with multiple conditions
>>> df[(df['Gold'] > 0) & (df['Gold.1'] > 0)]

結合 / Merging

サンプルデータとして下記を使用する。 / Sample data is as follow:

>>> import pandas as pd
>>> staff_df = pd.DataFrame([{'Name': 'Kelly', 'Role': 'Director of HR'},
...                          {'Name': 'Sally', 'Role': 'Course liasion'},
...                          {'Name': 'James', 'Role': 'Grader'}])
>>> staff_df = staff_df.set_index('Name')
>>> student_df = pd.DataFrame([{'Name': 'James', 'School': 'Business'},
...                            {'Name': 'Mike', 'School': 'Law'},
...                            {'Name': 'Sally', 'School': 'Engineering'}])
>>> student_df = student_df.set_index('Name')
>>> 
>>> staff_df
                 Role
Name                 
Kelly  Director of HR
Sally  Course liasion
James          Grader
>>> 
>>> student_df
            School
Name              
James     Business
Mike           Law
Sally  Engineering
:white_check_mark: 外部結合 / Outer merging

スタッフもしくは学生であるデータを取得する / Get data of who is student or staff

>>> pd.merge(staff_df, student_df, how='outer', left_index=True, right_index=True)
                 Role       School
Name                              
James          Grader     Business
Kelly  Director of HR          NaN
Mike              NaN          Law
Sally  Course liasion  Engineering
:white_check_mark: 内部結合 / Inner merging

スタッフもしくは学生であるデータを取得する / Get data of who is student and staff

>>> pd.merge(staff_df, student_df, how='inner', left_index=True, right_index=True)
                 Role       School
Name                              
James          Grader     Business
Sally  Course liasion  Engineering
:white_check_mark: 左外部結合 / Left merging

スタッフのデータを取得する。もし、そのスタッフが学生でもある場合は、Schoolデータも取得する。 / Get data of who is staff. If the staff is also student, get the data of school.

>>> pd.merge(staff_df, student_df, how='left', left_index=True, right_index=True)
                 Role       School
Name                              
Kelly  Director of HR          NaN
Sally  Course liasion  Engineering
James          Grader     Business
:white_check_mark: 右外部結合 / Right merging

学生のデータを取得する。もし、その学生がスタッフでもある場合は、Roleデータも取得する。 / Get data of who is student. If the student is also staff, get the data of role.

>>> pd.merge(staff_df, student_df, how='right', left_index=True, right_index=True)
                 Role       School
Name                              
James          Grader     Business
Mike              NaN          Law
Sally  Course liasion  Engineering
:white_check_mark: インデックス以外のカラムを使って結合する / Merging not using index
>>> products = pd.DataFrame([{'Product ID': 4109, 'Price': 5.0, 'Product': 'Suchi Roll'},
...                          {'Product ID': 1412, 'Price': 0.5, 'Product': 'Egg'},
...                          {'Product ID': 8931, 'Price': 1.5, 'Product': 'Bagel'}])
>>> products = products.set_index('Product ID')
>>> products
            Price     Product
Product ID                   
4109          5.0  Suchi Roll
1412          0.5         Egg
8931          1.5       Bagel
>>> invoices = pd.DataFrame([{'Customer': 'Ali', 'Product ID': 4109, 'Quantity': 1},
...                          {'Customer': 'Eric', 'Product ID': 1412, 'Quantity': 12},
...                          {'Customer': 'Anda', 'Product ID': 8931, 'Quantity': 6},
...                          {'Customer': 'Sam', 'Product ID': 4109, 'Quantity': 2}])
>>> invoices
  Customer  Product ID  Quantity
0      Ali        4109         1
1     Eric        1412        12
2     Anda        8931         6
3      Sam        4109         2
>>>
>>> pd.merge(products, invoices, how='right', left_index=True, right_on='Product ID')
   Price     Product Customer  Product ID  Quantity
0    5.0  Suchi Roll      Ali        4109         1
1    0.5         Egg     Eric        1412        12
2    1.5       Bagel     Anda        8931         6
3    5.0  Suchi Roll      Sam        4109         2
:white_check_mark: 複数のカラムをキーとして結合する / Merging with multiple keys
>>> staff_df = pd.DataFrame([{'First Name': 'Kelly', 'Last Name': 'Desjardins', 'Role': 'Director of HR'},
...                          {'First Name': 'Sally', 'Last Name': 'Brooks', 'Role': 'Course liasion'},
...                          {'First Name': 'James', 'Last Name': 'Wilde', 'Role': 'Grader'}])
>>> student_df = pd.DataFrame([{'First Name': 'James', 'Last Name': 'Hammond', 'School': 'Business'},
...                            {'First Name': 'Mike', 'Last Name': 'Smith', 'School': 'Law'},
...                            {'First Name': 'Sally', 'Last Name': 'Brooks', 'School': 'Engineering'}])
>>> staff_df
  First Name   Last Name            Role
0      Kelly  Desjardins  Director of HR
1      Sally      Brooks  Course liasion
2      James       Wilde          Grader
>>> student_df
  First Name Last Name       School
0      James   Hammond     Business
1       Mike     Smith          Law
2      Sally    Brooks  Engineering
>>> pd.merge(staff_df, student_df, how='inner', left_on=['First Name','Last Name'], right_on=['First Name','Last Name'])
  First Name Last Name            Role       School
0      Sally    Brooks  Course liasion  Engineering

集約 / Grouping

:white_check_mark: カラムAで集約して、他のカラムの合計値を出す / Group by column 'A' and calculate sum of other columns
>>> df.groupby('A').agg('sum')
>>> df.groupby('A').agg({'B': sum})
33
51
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
33
51