More than 1 year has passed since last update.

Pythonで比較演算子を再定義する

Python

Last updated at 2024-04-10Posted at 2024-04-07

問題

以下の問題を解きたい。

Find the most frequent K words in the list of words.
If there are multiple words with the same frequency, prioritize the lexicographically smaller one.
Returned array does not need to be sorted.

解答

以下のようにヒープを使いたい。

def top_k_words(words, k):
    top_words = [] # as min heap length k.

    for word, cnt in Counter(words).items():
        # ヒープの中の要素数がkより多くなったら、以下の優先順位で要素をヒープからpopしたい。
        # - 出現回数の少ないもの
        # - その中では辞書順で後ろ側のもの
        heappush(top_words, (cnt, - word)) ###
        if k < len(top_words):
            heappop(top_words)

    return [- neg_word for _, neg_word in top_words]

しかし、- word(stringのマイナス)なんてできない。
(与えられた配列の要素が数字であればこれで可能。)
どうするべきか？
-> heapの要素のためのクラスを定義し、その中で比較演算子を再実装すればいい。

class WordCnt:
    def __init__(self, word, cnt):
        self.word = word
        self.cnt = cnt

    def __lt__(self, other):
        if self.cnt != other.cnt:
            return self.cnt < other.cnt

        return self.word > other.word # 比較演算子を逆にした

def top_k_words(words, k):
    top_words = [] # as min heap length k.

    for word, cnt in Counter(words).items():
        # 出現回数がkより多くなったら、以下の優先順位で要素をヒープからpopしたい。
        # - 出現回数の少ないもの
        # - その中では辞書順で後ろ側のもの
        heappush(top_words, WordCnt(word, cnt))
        if k < len(top_words):
            heappop(top_words)

    return [wc.word for wc in top_words]

考察

Pythonの比較演算子は、以下がある。

演算子	実装するメソッド	注意
<	`__lt__`	未実装時に、もし `比較相手.__gt__()`されていれば、それが使用される
<=	`__le__`	TODO: 調べる。 ⚠️ `__lt__` or `__eq__`ではない。
==	`__eq__`	TODO: 調べる。
>=	`__ge__`	TODO: 調べる。 ⚠️ `__gt__` or `__eq__`ではない。
>	`__gt__`	未実装時にもし `比較相手.__lt__()`定義されていれば、それが使用される

パッと確認したところ、heapqの内部では、__eq__は使用されていなさそう。
heapの要素は、__lt__()さえ実装してあれば良さそう。

Ref.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up