Python のデータ重複削除処理あれこれメモ

Python

Posted at 2024-11-15

ユースケース1. 純粋な配列で重複した値を一つにまとめる

Set が使える言語ならこれでおｋ。

data = [1, 2, 3, 3, 5, 7, 13, 13]
list(set(data)) # => [1, 2, 3, 5, 7, 13]

ユースケース2. エンティティのプロパティをキーにした重複削除

後勝ちする方法と先勝ちする方法がある。

from dataclasses import dataclass

@dataclass
class Person:
    id: int
    name: str

というクラスがある前提で

persons = [Person(id=1, name="foo"), Person(id=1, name="bar")]

# 後勝ちするには
last_merge = {p.id: p for p in persons}
last_merge.values() #=> Person(id=1, name='bar')

# 先勝ちするには
first_come = dict[int, Person]()
for p in persons:
    first_come.setdefault(p.id, p)
first_come.values() #=> Person(id=1, name='foo')

他にもあれば都度追記予定。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up