LoginSignup
4
4

More than 5 years have passed since last update.

pandas 備忘録

Last updated at Posted at 2016-02-03

pandas備忘録

概要

  • メモ程度ですが,簡単にまとめていきます
  • 随時更新していきます(たぶん)

参考

方法

csvファイルの読み込み

In [1]: df = read_csv('./input/hoge.csv')
  • シングルクォーテーション忘れがち..

中身の確認

In [2]: df
Out[3]:
  label  a   b    c  d
0    aa  1  11  111  e
1    bb  2  22  222  e
2    cc  3  33  333  e
3    dd  4  44  444  e

1ラベル目を抽出

In [2]: df[[1]]
Out[2]:
   a
0  1
1  2
2  3
3  4

'a'ラベルを抽出

In [3]: df['a']
Out[3]:
0    1
1    2
2    3
3    4
Name: a, dtype: int64

場所指定して要素にアクセスその1(ラベルで指定)

In [4]: df.loc[:,['a','b']]
Out[4]:
   a   b
0  1  11
1  2  22
2  3  33
3  4  44

場所指定して要素にアクセスその2(列で指定)

In [5]: df.iloc[:,[1,2]]
Out[5]:
   a   b
0  1  11
1  2  22
2  3  33
3  4  44

数値部分の統計量

In [6]: df.describe()
Out[6]:
              a          b           c
count  4.000000   4.000000    4.000000
mean   2.500000  27.500000  277.500000
std    1.290994  14.200939  143.300384
min    1.000000  11.000000  111.000000
25%    1.750000  19.250000  194.250000
50%    2.500000  27.500000  277.500000
75%    3.250000  35.750000  360.750000
max    4.000000  44.000000  444.000000

numpy.ndarray形式で

In [7]: df.values
Out[7]:
array([['aa', 1, 11, 111, 'e'],
       ['bb', 2, 22, 222, 'e'],
       ['cc', 3, 33, 333, 'e'],
       ['dd', 4, 44, 444, 'e']], dtype=object)

各ラベルのデータタイプ取得

>>> df.dtypes
target      int64
v1        float64
v2        float64
v3         object
v4        float64
v5        float64
v6        float64
v7        float64
v8          int64
v9        float64
dtype: object

ラベルのデータタイプで選択

>>> df.ix[:, df.dtypes == np.int64]
         target     v8
   No.1      1      2
   No.2      2      2

iteritems(), iterrows()

>>> df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])
>>> df
    A   B   C
a   1   4    a
b   2   5    b
c   3   6    c

>>> for (key,column) in df.iteritems():
        print key
        print column
A
a    1
b    2
c    3
Name: A, dtype: int64
B
a    4
b    5
c    6
Name: B, dtype: int64
C
a    x
b    y
c    z
Name: C, dtype: object
>>> for (key, row) in df.iterrows():
        print key
        print row
a
A    1
B    4
C    x
Name: a, dtype: object
b
A    2
B    5
C    y
Name: b, dtype: object
c
A    3
B    6
C    z
Name: c, dtype: object

factorize

>>> pd.factorize(df['A'])
(array([0, 1, 2]), Int64Index([1, 2, 3], dtype='int64'))
>>> pd.factorize(df['B'])
(array([0, 1, 2]), Int64Index([4, 5, 6], dtype='int64'))
>>> pd.factorize(df['C'])
(array([0, 1, 2]), Index([u'x', u'y', u'z'], dtype='object'))
>>> df['C'], indexer = pd.factorize(df['C'])
>>> df
    A   B   C
a   1   4   0
b   2   5   1
c   3   6   2
>>> indexer
Index([u'x', u'y', u'z'], dtype='object')
4
4
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
4