LoginSignup
5
0

pandasでTBL差分チェック関数を作成してみた

Last updated at Posted at 2023-06-20

データの作成

df = pd.read_csv(file_name,encoding="utf-8")
df1 = df[0:11].copy()
df2 = df[0:11].copy()

# 違いを生む
df2.loc[10,"name"] ="John Deo"


df1

id name class mark gender
0 1 John Deo Four 75
1 2 Max Ruin Three 85
2 3 Arnold Three 55
3 4 Krish Star Four 60
4 5 John Mike Four 60
5 6 Alex John Four 55
6 7 My John Rob Fifth 78
8 9 Tes Qry Six 78
9 10 Big John Four 55
10 11 Ronald Six 89

df2

id name class mark gender
0 1 John Deo Four 75
1 2 Max Ruin Three 85
2 3 Arnold Three 55
3 4 Krish Star Four 60
4 5 John Mike Four 60
5 6 Alex John Four 55
6 7 My John Rob Fifth 78
8 9 Tes Qry Six 78
9 10 Big John Four 55
10 11 John Deo Six 89

df2のnameをJhon Deoにして、差分を生む

この2つのTBLを比較し、差分チェックを行う

データの差分チェック関数


from typing import Union

# python 3.10 added
def diff_check(df1:pd.DataFrame,df2:pd.DataFrame)-> bool | pd.DataFrame:

# python 3.9
def diff_check(df1:pd.DataFrame,df2:pd.DataFrame)-> Union[bool, pd.DataFrame]:
  df_diff = df1[df1["id"].isin(df2["id"])]
  if df_diff.equals(df2) is True:
    return True
  else:
    df_dup = df1.compare(df2)
    return df_dup

    
diff_check(df1,df2)


一致すれば、Trueが返り
不一致なら不一致のdataframeが返ってくる

python3.10以上ならtype hintに上のやり方を試してみてはどうでしょうか

差分結果

name
self other
10 Ronald John Deo

これを見ると、nameカラムでRonaldとJhon Deoがindex番号10で重複していることがわかる

ちなみに

#nameで重複しているものを取得できたり
df_diff = df1[df1["id"].isin(df2["id"])]



参考:

https://stackoverflow.com/questions/33945261/how-to-specify-multiple-return-types-using-type-hints

https://motamemo.com/python/pandas-tips/pandas-dataframe-change-value/

5
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
5
0