LoginSignup
0

pandasでTBL差分チェック関数を作成してみた

Last updated at Posted at 2023-06-20

データの作成

df = pd.read_csv(file_name,encoding="utf-8")
df1 = df[0:11].copy()
df2 = df[0:11].copy()

# 違いを生む
df2.loc[10,"name"] ="John Deo"


df1

id name class mark gender
0 1 John Deo Four 75
1 2 Max Ruin Three 85
2 3 Arnold Three 55
3 4 Krish Star Four 60
4 5 John Mike Four 60
5 6 Alex John Four 55
6 7 My John Rob Fifth 78
8 9 Tes Qry Six 78
9 10 Big John Four 55
10 11 Ronald Six 89

df2

id name class mark gender
0 1 John Deo Four 75
1 2 Max Ruin Three 85
2 3 Arnold Three 55
3 4 Krish Star Four 60
4 5 John Mike Four 60
5 6 Alex John Four 55
6 7 My John Rob Fifth 78
8 9 Tes Qry Six 78
9 10 Big John Four 55
10 11 John Deo Six 89

df2のnameをJhon Deoにして、差分を生む

この2つのTBLを比較し、差分チェックを行う

データの差分チェック関数


from typing import Union

# python 3.10 added
def diff_check(df1:pd.DataFrame,df2:pd.DataFrame)-> bool | pd.DataFrame:

# python 3.9
def diff_check(df1:pd.DataFrame,df2:pd.DataFrame)-> Union[bool, pd.DataFrame]:
  df_diff = df1[df1["id"].isin(df2["id"])]
  if df_diff.equals(df2) is True:
    return True
  else:
    df_dup = df1.compare(df2)
    return df_dup

    
diff_check(df1,df2)


一致すれば、Trueが返り
不一致なら不一致のdataframeが返ってくる

python3.10以上ならtype hintに上のやり方を試してみてはどうでしょうか

差分結果

name
self other
10 Ronald John Deo

これを見ると、nameカラムでRonaldとJhon Deoがindex番号10で重複していることがわかる

ちなみに

#nameで重複しているものを取得できたり
df_diff = df1[df1["id"].isin(df2["id"])]



参考:

https://stackoverflow.com/questions/33945261/how-to-specify-multiple-return-types-using-type-hints

https://motamemo.com/python/pandas-tips/pandas-dataframe-change-value/

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
What you can do with signing up
0