データの作成
df = pd.read_csv(file_name,encoding="utf-8")
df1 = df[0:11].copy()
df2 = df[0:11].copy()
# 違いを生む
df2.loc[10,"name"] ="John Deo"
df1
id | name | class | mark | gender |
---|---|---|---|---|
0 | 1 | John Deo | Four | 75 |
1 | 2 | Max Ruin | Three | 85 |
2 | 3 | Arnold | Three | 55 |
3 | 4 | Krish Star | Four | 60 |
4 | 5 | John Mike | Four | 60 |
5 | 6 | Alex John | Four | 55 |
6 | 7 | My John Rob | Fifth | 78 |
8 | 9 | Tes Qry | Six | 78 |
9 | 10 | Big John | Four | 55 |
10 | 11 | Ronald | Six | 89 |
df2
id | name | class | mark | gender |
---|---|---|---|---|
0 | 1 | John Deo | Four | 75 |
1 | 2 | Max Ruin | Three | 85 |
2 | 3 | Arnold | Three | 55 |
3 | 4 | Krish Star | Four | 60 |
4 | 5 | John Mike | Four | 60 |
5 | 6 | Alex John | Four | 55 |
6 | 7 | My John Rob | Fifth | 78 |
8 | 9 | Tes Qry | Six | 78 |
9 | 10 | Big John | Four | 55 |
10 | 11 | John Deo | Six | 89 |
df2のnameをJhon Deoにして、差分を生む
この2つのTBLを比較し、差分チェックを行う
データの差分チェック関数
from typing import Union
# python 3.10 added
def diff_check(df1:pd.DataFrame,df2:pd.DataFrame)-> bool | pd.DataFrame:
# python 3.9
def diff_check(df1:pd.DataFrame,df2:pd.DataFrame)-> Union[bool, pd.DataFrame]:
df_diff = df1[df1["id"].isin(df2["id"])]
if df_diff.equals(df2) is True:
return True
else:
df_dup = df1.compare(df2)
return df_dup
diff_check(df1,df2)
一致すれば、Trueが返り
不一致なら不一致のdataframeが返ってくる
python3.10以上ならtype hintに上のやり方を試してみてはどうでしょうか
差分結果
name | ||
---|---|---|
self | other | |
10 | Ronald | John Deo |
これを見ると、nameカラムでRonaldとJhon Deoがindex番号10で重複していることがわかる
ちなみに
#nameで重複しているものを取得できたり
df_diff = df1[df1["id"].isin(df2["id"])]
参考:
https://stackoverflow.com/questions/33945261/how-to-specify-multiple-return-types-using-type-hints
https://motamemo.com/python/pandas-tips/pandas-dataframe-change-value/