26
Help us understand the problem. What are the problem?

More than 5 years have passed since last update.

posted at

updated at

【Python】pandas.DataFrameで置換処理

forループで対処していて時間がかかっていたけど,一瞬で置換する方法があったので備忘のため。

import numpy as np
import pandas as pd

cols = ['var1', 'var2', 'var3', 'var4']
df1 = pd.DataFrame(np.random.randn(4, 4), columns=cols)
df2 = pd.DataFrame(np.arange(16).reshape(4, 4), columns=cols)

df1

       var1      var2      var3      var4
0 -0.083782  0.964222  0.832664 -0.528963
1  0.017696  0.144067  0.093823  0.147779
2 -0.082808 -0.893112 -0.477983 -0.623641
3  0.581019 -1.603081 -0.717007  0.849844

df2

   var1  var2  var3  var4
0     0     1     2     3
1     4     5     6     7
2     8     9    10    11
3    12    13    14    15

df1の要素のうち,負の要素をdf2の値で置換する。

# 置換対象.where(残す条件, 代入する対象)
df1.where(df1 >= 0, df2)

       var1       var2       var3       var4
0  0.000000   0.964222   0.832664   3.000000
1  0.017696   0.144067   0.093823   0.147779
2  8.000000   9.000000  10.000000  11.000000
3  0.581019  13.000000  14.000000   0.849844

特定の列だけに対しても可能。
下記の場合,var2列がゼロ以上の場合はvar1列の値を用い,負の場合はvar3列の値を用いた列を作成できる。

df1['var1'].where(df1['var2'] >= 0, df1['var3'])

0   -0.083782
1    0.017696
2   -0.477983
3   -0.717007
Name: var1, dtype: float64
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Sign upLogin
26
Help us understand the problem. What are the problem?