More than 1 year has passed since last update.

Numpyを使ったデータの扱い方

Last updated at 2022-09-03Posted at 2022-05-06

はじめに

データ分析関連の基礎を身に着けようと思って、本を読みながら勉強しているんですが、
手を動かしながらじゃないと全然頭に入らないので、実行結果を整理しながらまとめています。
普段は使う関数だけ調べながら開発してるので、体系的にまとめられている本をみると発見があって勉強になります。

Pythonによるデータ分析の教科書

行列の参照

データを参照する際にスライスを使ったり、負数を使ってインデックスの末尾から指定することもできます。

import numpy as np

a = np.array([[1, 2, 3], [4, 5, 6]]) 
b = np.array([7,8,9])
c = np.array([[7],[8],[9]])
d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])

print(a[-1:, [1,2]], b.shape, c.shape)
print(d[0:2,[1,2]])

out

[[5 6]] (3,) (3, 1)
[[2 3]
 [5 6]]

bの配列のshapeの見え方が、(1,3)とはならない点が不思議ですが、こういうものみたいです。
dはスライスを使って参照してますが、0:2とすると3番目のデータが出力されない点も注意が必要です。

行列の変形

次元の変換をする　reshape

配列の次元を変換します。要素の数が合わない時はエラーになります。

a = np.array([1,2,3,4,5,6,7,8])
print(a.reshape(2,4))

out

[[1 2 3 4]
 [5 6 7 8]]

1次元のリストにする　ravel　flatten

行列の次元を1次元にします。見た目は同じですが、
ravelは参照先を返すのに対して、flattenはコピーを返します。
以下の例では、1次元の行列に書き換えた後に、a[0][0]を書き換えており、
その結果、ravelで変換したものは1番目の要素の内容が書き換わっています。

a = np.array([[1,2,3,4],[5,6,7,8]])
a2 = a.ravel()
a3 = a.flatten()

a[0][0] = 9 

print(a2)
print(a3)

out

[9 2 3 4 5 6 7 8]
[1 2 3 4 5 6 7 8]

行列のデータ生成

規則的なデータ生成　np.arange

引数に与えられた数の数列を作成します。
引数が一つの場合は0から順の数列を、三つ書くと「0~100を10ずつ足しながら」のような数列を作成できます。

a = np.arange(10)
print(a)
b = np.arange(0,100,10)
print(b)

out

[0 1 2 3 4 5 6 7 8 9]
[ 0 10 20 30 40 50 60 70 80 90]

ランダムなデータ生成　np.random.rand　np.random.seed

0~1の間の数値で乱数を生成します。
np.random.randは実行するたびにランダムな数値を出力しますが、np.random.seedで値を設定することで、出力の内容を一定にすることができます。

np.random.seed(100)
a = np.random.rand(3,2)
print(a)

out

[[0.54340494 0.27836939]
 [0.42451759 0.84477613]
 [0.00471886 0.12156912]]

範囲を指定したランダムなデータ生成　np.random.randint　np.random.uniform

範囲を指定してランダムな値を生成します。第一引数と第二引数に範囲を記載して、第三引数にサイズを記載します。
値の範囲は「a = np.random.randint(0,10)」の場合、0≦a＜10となります。

np.random.randintは整数で出力し、np.random.uniformは少数で出力します。

np.random.seed(100)
a = np.random.randint(0,10)
print(a)
b = np.random.randint(0,10,(2,3))
print(b)
c = np.random.uniform(0,10)
print(c)
d = np.random.uniform(0,10,(3,3))
print(d)

out

8
[[8 3 7]
 [7 0 4]]
1.2156912078311422
[[6.70749085 8.25852755 1.3670659 ]
 [5.75093329 8.91321954 2.09202122]]

正規分布に沿ったランダムなデータ生成　np.random.randn np.random.normal

これらもランダムなデータを作成しますが、一様乱数ではなく正規分布を使ったデータを生成します。
np.random.randnは平均0、分散1のデータを生成します。
np.random.normalは平均と分散を引数で与えることができるので、例えば、平均70、分散10のデータを作ることなどができます。

np.random.seed(100)
a = np.random.randn(2,3)
print(a)
b = np.random.normal(70,10,(2,3))
print(b)

out

[[-1.74976547  0.3426804   1.1530358 ]
 [-0.25243604  0.98132079  0.51421884]]
[[72.21179669 59.29956669 68.10504169]
 [72.55001444 65.41973014 74.35163488]]

行列を固定の値で埋める　np.zeros np.ones np.full

a = np.zeros((2,3))
print(a)
b = np.ones((2,3))
print(b)
c = np.full((2,3),10)
print(c)

out

[[0. 0. 0.]
 [0. 0. 0.]]
[[1. 1. 1.]
 [1. 1. 1.]]
[[10 10 10]
 [10 10 10]]

対角要素を持った行列の生成　np.eye

対角要素を持った行列を作ることができます。

a = np.eye(5)
print(a)

out

[[1. 0. 0. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 1. 0. 0.]
 [0. 0. 0. 1. 0.]
 [0. 0. 0. 0. 1.]]

等間隔のデータ生成　np.linspace

範囲指定で、均等に割ったデータを作成します。
0から始まるので例えば0~100までを5で割り切れる数でデータ作成する場合は21を引数で指定します。

a = np.linspace(0,100,21)
print(a)

out

[  0.   5.  10.  15.  20.  25.  30.  35.  40.  45.  50.  55.  60.  65.
  70.  75.  80.  85.  90.  95. 100.]

行列データの加工

要素の差で配列を作る　np.diff

一次元配列に対して実行することで要素間の差分を出力することができます。

a = np.array([1,9,4,5,6])
print(np.diff(a))

out

[ 8 -5  1  1]

行列を追加する　np.concatenate np.hstack np.vstack

行列を連結します。np.concatenateは引数で向きを指定できて、np.hstack や np.vstackと同じ結果を出力することができます。

a = np.array([[1,2],[3,4]])
b = np.array([[5,6],[7,8]])

c = np.concatenate([a,b],axis=1)
print(c)
d = np.hstack([a,b])
print(d)
e = np.concatenate([a,b],axis=0)
print(e)
f = np.vstack([a,b])
print(f)

out

[[1 2 5 6]
 [3 4 7 8]]
[[1 2 5 6]
 [3 4 7 8]]
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
[[1 2]
 [3 4]
 [5 6]
 [7 8]]

行列の分割　np.vsplit np.hsplit

np.vsplitは行方向の分割で、np.hsplitは列方向に分割します。

a = np.array([[1,2,3,4],[5,6,7,8]])

b,c = np.vsplit(a,[1])
print(b)
print(c)
d,e = np.hsplit(a,[1])
print(d)
print(e)

out

[[1 2 3 4]]
[[5 6 7 8]]
[[1]
 [5]]
[[2 3 4]
 [6 7 8]]

行列の縦と横を入れ替える　np.T

a = np.array([[1,2,3,4],[5,6,7,8]])
b = a.T

print(b)

out

[[1 5]
 [2 6]
 [3 7]
 [4 8]]

numpyの機能

ユニバーサルファンクション

行列の要素を一括で変換することができます。
絶対値、sin、cosを求める場合は以下のようにする一括で結果を表示できます。

a = np.array([[-1,2,-3,4],[-5,6,-7,8]])
b = np.abs(a)
c = np.sin(a)
d = np.cos(a)
print(b)
print(c)
print(d)

out

[[1 2 3 4]
 [5 6 7 8]]

ブロードキャスト

行列の要素に対して同じ値で一括計算することもできます。
また、次元数の違う行列同士の計算もできます。

a = np.array([[-1,2,-3,4],[-5,6,-7,8]])
b = np.array([0,2,4,8])

c = a + 10
print(c)
d = a * b
print(d)

out

[[ 9 12  7 14]
 [ 5 16  3 18]]
[[  0   4 -12  32]
 [  0  12 -28  64]]

判定・論理積

行列の要素は演算子を使うことで真偽の判定を行うことができます。
以下の例では要素が正の数かを判定しています。

a = np.array([[-1,2,-3,4],[-5,6,-7,8]])
b = a > 0
print(b)

out

[[False  True False  True]
 [False  True False  True]]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Numpyを使ったデータの扱い方

はじめに

行列の参照

行列の変形

次元の変換をする reshape

1次元のリストにする ravel flatten

行列のデータ生成

規則的なデータ生成 np.arange

ランダムなデータ生成 np.random.rand np.random.seed

範囲を指定したランダムなデータ生成 np.random.randint np.random.uniform

正規分布に沿ったランダムなデータ生成 np.random.randn np.random.normal

行列を固定の値で埋める np.zeros np.ones np.full

対角要素を持った行列の生成 np.eye

等間隔のデータ生成 np.linspace