9
7

More than 5 years have passed since last update.

python pandasでの列(column)へのSeriesの追加

Posted at

鬼ハマりしたので投下.

追加するSeriesのIndexが0からのとき

>>> s1 = pd.Series(data=[10,20,30])
>>> s1
0    10
1    20
2    30
dtype: int64

>>> s2 = pd.Series(data=[100,200,300])
>>> s2
0    100
1    200
2    300
dtype: int64

という2つのSeriesをDataFrameの列(column)として追加する.

>>> df = pd.DataFrame()
>>> df[1]=s1
>>> df[2]=s2
>>> df
    1    2
0  10  100
1  20  200
2  30  300

これは簡単.

追加するSeriesのIndexが異なるとき

>>> s1 = pd.Series(data=[10,20,30], index=[1,2,3])
>>> s1
1    10
2    20
3    30
dtype: int64
>>> s2 = pd.Series(data=[100,200,300], index=[2,3,4])
>>> s2
2    100
3    200
4    300
dtype: int64

s1とs2のindexは0からではなく,共通していないものがある.

このとき先ほどと同じようにDataFrameに追加すると

>>> df[1]=s1
>>> df[2]=s2
>>> df
      1      2
0   NaN    NaN
1  10.0    NaN
2  20.0  100.0

と個数固定で勝手に0から入っていってしまう.

元s1[3]が見たいと思って無理やりdfの中身を見ようとするとエラーになる.

>>> s1[3]
30
>>> df[1][3]
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pandas/core/series.py", line 603, in __getitem__
    result = self.index.get_value(self, key)
  File "/usr/local/lib/python2.7/dist-packages/pandas/indexes/base.py", line 2169, in get_value
    tz=getattr(series.dtype, 'tz', None))
  File "pandas/index.pyx", line 98, in pandas.index.IndexEngine.get_value (pandas/index.c:3557)
  File "pandas/index.pyx", line 106, in pandas.index.IndexEngine.get_value (pandas/index.c:3240)
  File "pandas/index.pyx", line 154, in pandas.index.IndexEngine.get_loc (pandas/index.c:4279)
  File "pandas/src/hashtable_class_helper.pxi", line 404, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8564)
  File "pandas/src/hashtable_class_helper.pxi", line 410, in pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8508)
KeyError: 3

こういうときは,pandas.concatを使う.

>>> df = pd.DataFrame()
>>> df = pd.concat([df, s1], axis=1)
>>> df
    0
1  10
2  20
3  30
>>> df = pd.concat([df, s2], axis=1)
>>> df
      0      0
1  10.0    NaN
2  20.0  100.0
3  30.0  200.0
4   NaN  300.0

引数にaxis=1を入れるとcolumn方向に追加される.また,ないところにはnumpy.nanが入る.

ただし,columnが0になる.
引数で指定できないっぽいので,まずSeriesを1次元のDataFrameにしてからconcatする.

>>> df = pd.DataFrame()
>>> df = pd.concat([df, pd.DataFrame(s1, columns=[1])], axis=1)
>>> df = pd.concat([df, pd.DataFrame(s2, columns=[2])], axis=1)
>>> df
      1      2
1  10.0    NaN
2  20.0  100.0
3  30.0  200.0
4   NaN  300
9
7
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
9
7