0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

【pandas】欠損補完しながら重複削除

Posted at

はじめに

pandasのデータフレームをあるkeyで重複削除する際、同一レコードだと判定されたレコード同士で欠損補完してから重複削除したい場合があります。

import pandas as pd

df = pd.DataFrame({
    'building_name': ['Aビル', 'Aビ ル', None, 'Cビル', 'Bビル', None, 'Dビ ル'],
    'property_scale': ['large', 'large', , 'small', 'small', 'small', 'large'],
    'city_code': [1, 1, 1, 2, 1, 1, 1]
})
df
building_name property_scale city_code
Aビル large 1
Aビル large 1
None small 1
Cビル small 2
Bビル small 1
None small 1
Dビル large 1

補完+重複削除関数

from pandas.core.frame import DataFrame

def drop_duplicates(df: DataFrame, subset: list, fillna: bool = False) -> DataFrame:
    """subsetをkeyに欠損補完してから重複削除.

    Args:
        df (DataFrame): 任意のデータフレーム
        subset (list): 重複削除するkey
        fillna (bool): 重複レコード同士で欠損補完するかどうか. default False.
    
    Returns:
        DataFrame

    """
    group_info = df.groupby(by=subset)
    new_df = pd.concat([
        group_info.get_group(group_name).fillna(method='bfill').fillna(method='ffill')
        for group_name
        in group_info.groups.keys()])
    new_df = new_df.drop_duplicates(subset=subset)
    return new_df

実行

drop_duplicates(df, ['property_scale', 'city_code'], True)
building_name property_scale city_code
Aビル large 1
Bビル small 1
Cビル small 2
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?