More than 5 years have passed since last update.

pandas 備忘録

Last updated at 2016-02-11Posted at 2016-02-03

pandas備忘録

概要

メモ程度ですが，簡単にまとめていきます
随時更新していきます(たぶん)

参考

http://blog.brainpad.co.jp/entry/2014/12/10/204111
http://sinhrks.hatenablog.com/entry/2015/06/18/221747
iteritems()について

方法

csvファイルの読み込み

In [1]: df = read_csv('./input/hoge.csv')

シングルクォーテーション忘れがち．．

中身の確認

In [2]: df
Out[3]:
  label  a   b    c  d
0    aa  1  11  111  e
1    bb  2  22  222  e
2    cc  3  33  333  e
3    dd  4  44  444  e

1ラベル目を抽出

In [2]: df[[1]]
Out[2]:
   a
0  1
1  2
2  3
3  4

'a'ラベルを抽出

In [3]: df['a']
Out[3]:
0    1
1    2
2    3
3    4
Name: a, dtype: int64

場所指定して要素にアクセスその1(ラベルで指定)

In [4]: df.loc[:,['a','b']]
Out[4]:
   a   b
0  1  11
1  2  22
2  3  33
3  4  44

場所指定して要素にアクセスその2(列で指定)

In [5]: df.iloc[:,[1,2]]
Out[5]:
   a   b
0  1  11
1  2  22
2  3  33
3  4  44

数値部分の統計量

In [6]: df.describe()
Out[6]:
              a          b           c
count  4.000000   4.000000    4.000000
mean   2.500000  27.500000  277.500000
std    1.290994  14.200939  143.300384
min    1.000000  11.000000  111.000000
25%    1.750000  19.250000  194.250000
50%    2.500000  27.500000  277.500000
75%    3.250000  35.750000  360.750000
max    4.000000  44.000000  444.000000

numpy.ndarray形式で

In [7]: df.values
Out[7]:
array([['aa', 1, 11, 111, 'e'],
       ['bb', 2, 22, 222, 'e'],
       ['cc', 3, 33, 333, 'e'],
       ['dd', 4, 44, 444, 'e']], dtype=object)

各ラベルのデータタイプ取得

>>> df.dtypes
target      int64
v1        float64
v2        float64
v3         object
v4        float64
v5        float64
v6        float64
v7        float64
v8          int64
v9        float64
dtype: object

ラベルのデータタイプで選択

>>> df.ix[:, df.dtypes == np.int64]
         target     v8
   No.1      1      2
   No.2      2      2

iteritems(), iterrows()

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
>>> df
    A  	B   C
a	1	4    a
b	2	5    b
c	3	6    c

>>> for (key,column) in df.iteritems():
        print key
        print column
A
a    1
b    2
c    3
Name: A, dtype: int64
B
a    4
b    5
c    6
Name: B, dtype: int64
C
a    x
b    y
c    z
Name: C, dtype: object

>>> for (key, row) in df.iterrows():
        print key
        print row
a
A    1
B    4
C    x
Name: a, dtype: object
b
A    2
B    5
C    y
Name: b, dtype: object
c
A    3
B    6
C    z
Name: c, dtype: object

factorize

>>> pd.factorize(df['A'])
(array([0, 1, 2]), Int64Index([1, 2, 3], dtype='int64'))
>>> pd.factorize(df['B'])
(array([0, 1, 2]), Int64Index([4, 5, 6], dtype='int64'))
>>> pd.factorize(df['C'])
(array([0, 1, 2]), Index([u'x', u'y', u'z'], dtype='object'))

>>> df['C'], indexer = pd.factorize(df['C'])
>>> df
	A	B	C
a	1	4	0
b	2	5	1
c	3	6	2
>>> indexer
Index([u'x', u'y', u'z'], dtype='object')

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up