0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

pandasでカンマ区切りをグループ単位で行展開

Posted at

はじめに

user_id created_at editor
12345 2020-01-01 atom,vim,vi
23456 2020-01-02 vim,pycharm
のように、editor列をuser_id単位に行を展開したい場合、たまにありますよね。こんな風に
user_id created_at editor
12345 2020-01-01 atom
12345 2020-01-01 vim
12345 2020-01-01 vi
23456 2020-01-02 vim
23456 2020-01-02 pycharm

そんな時、pandasを使って可変する場合どうするか。

変形

アプローチは色々ありますが、一つ、stackを使う場合を想定します
列の数が多い場合に関しては、stackを使うのが有用ですが、少ない場合itertools.chainやnumpy.repeatを用いる場合もあります。
コードの可読性は決して高いとは言いませんが。。。

>>> import pandas as pd
>>> df = pd.DataFrame({'user_id':[12345,23456],
...                   "created_at":['2020-01-01','2020-01-02'],
...                   "editor":["atom,vim,vi","vim,pycharm"]})
>>> df = df.set_index(['user_id','created_at']).apply(lambda x : x.str.split(',')).stack(
...     ).apply(pd.Series).stack().unstack(level=2).reset_index(level=[0,1])
>>> df
   user_id  created_at   editor
0    12345  2020-01-01     atom
1    12345  2020-01-01      vim
2    12345  2020-01-01       vi
0    23456  2020-01-02      vim
1    23456  2020-01-02  pycharm
0
0
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?