はじめに
こういうデータがあった時に
tmp = pd.DataFrame(dict(key=np.arange(10)//5, time=np.arange(10)%5, val1=np.random.randn(10),val2=np.random.randn(10)))
tmp
|
key |
time |
val1 |
val2 |
0 |
0 |
0 |
-1.152069 |
0.788045 |
1 |
0 |
1 |
0.133470 |
0.347221 |
2 |
0 |
2 |
0.483643 |
0.487755 |
3 |
0 |
3 |
1.691687 |
1.426061 |
4 |
0 |
4 |
-0.575070 |
0.050923 |
5 |
1 |
0 |
-2.627809 |
-0.251222 |
6 |
1 |
1 |
0.668707 |
-1.490587 |
7 |
1 |
2 |
0.674961 |
0.623323 |
8 |
1 |
3 |
-1.788848 |
-0.915043 |
9 |
1 |
4 |
-1.027477 |
0.880744 |
以下のように'val1','val2'
に対して
Group毎に処理をしようとするとGroupkeyが消えてしまう
tmp.groupby(['key'])[['val1','val2']].transform(lambda s: s)
|
val1 |
val2 |
0 |
-1.152069 |
0.788045 |
1 |
0.133470 |
0.347221 |
2 |
0.483643 |
0.487755 |
3 |
1.691687 |
1.426061 |
4 |
-0.575070 |
0.050923 |
5 |
-2.627809 |
-0.251222 |
6 |
0.668707 |
-1.490587 |
7 |
0.674961 |
0.623323 |
8 |
-1.788848 |
-0.915043 |
9 |
-1.027477 |
0.880744 |
解決方法
消えてほしくないlabelはindexにしておく
tmp.set_index(['key','time']).groupby(['key'])[['val1','val2']].transform(lambda s: s)
|
|
val1 |
val2 |
key |
time |
|
|
0 |
0 |
-1.152069 |
0.788045 |
1 |
0.133470 |
0.347221 |
2 |
0.483643 |
0.487755 |
3 |
1.691687 |
1.426061 |
4 |
-0.575070 |
0.050923 |
1 |
0 |
-2.627809 |
-0.251222 |
1 |
0.668707 |
-1.490587 |
2 |
0.674961 |
0.623323 |
3 |
-1.788848 |
-0.915043 |
4 |
-1.027477 |
0.880744 |
使用例
tmp.set_index(['key','time']).groupby(['key'])[['val1','val2']].transform(lambda s: (s-s.mean())/s.std())
|
|
val1 |
val2 |
key |
time |
|
|
0 |
0 |
-1.169663 |
0.321366 |
1 |
0.015803 |
-0.521662 |
2 |
0.338717 |
-0.252906 |
3 |
1.452722 |
1.541503 |
4 |
-0.637580 |
-1.088302 |
1 |
0 |
-1.225673 |
-0.020611 |
1 |
1.009441 |
-1.256779 |
2 |
1.013681 |
0.851676 |
3 |
-0.656838 |
-0.682719 |
4 |
-0.140611 |
1.108433 |
tmp.set_index(['key','time']).groupby(['key'])[['val1','val2']].diff()
|
|
val1 |
val2 |
key |
time |
|
|
0 |
0 |
NaN |
NaN |
1 |
1.285538 |
-0.440823 |
2 |
0.350173 |
0.140534 |
3 |
1.208045 |
0.938305 |
4 |
-2.266757 |
-1.375138 |
1 |
0 |
NaN |
NaN |
1 |
3.296516 |
-1.239366 |
2 |
0.006253 |
2.113910 |
3 |
-2.463809 |
-1.538365 |
4 |
0.761371 |
1.795787 |
以上