More than 1 year has passed since last update.

Polars: 各グループの最初の行を削除する

Posted at 2023-10-10

Polars: 各グループの最初の行を削除する

df = pl.from_repr(
    """
    shape: (8, 2)
    ┌─────┬───────┐
    │ id  ┆ value │
    │ --- ┆ ---   │
    │ i64 ┆ i64   │
    ╞═════╪═══════╡
    │ 1   ┆ 5     │
    │ 2   ┆ 6     │
    │ 2   ┆ 1     │
    │ 2   ┆ 2     │
    │ 2   ┆ 3     │
    │ 3   ┆ 30    │
    │ 3   ┆ 10    │
    │ 3   ┆ 20    │
    └─────┴───────┘
    """
)

id列が同じ行が複数ある場合、その各idについて1行目を削除したい。

result_output = pl.from_repr(
    """
    ┌─────┬───────┐
    │ id  ┆ value │
    │ --- ┆ ---   │
    │ i64 ┆ i64   │
    ╞═════╪═══════╡
    │ 1   ┆ 5     │
    │ 2   ┆ 1     │
    │ 2   ┆ 2     │
    │ 2   ┆ 3     │
    │ 3   ┆ 10    │
    │ 3   ┆ 20    │
    └─────┴───────┘
    """
)

この問題は以下のように分解できる。

id列で定義されるグループのうち、最初に出現する行を削除する。
ただし、グループの行数がもともと1行のみを持つ場合は、その行を残す。

行フィルタリング操作なので、df.filter()を使うことが考えられる。上の問題は、以下のようなExprを作ると解釈できる。

id列で定義されるグループのうち、最初に出現する行がFalseとなり、それ以外がTrue。
1がFalseのもののうち、グループ長が1のidはTrue。

┌─────┬───────┬───────┬────────┬───────┐
│ id  ┆ value ┆ cond1 ┆ cond2  ┆ cond  │
│ --- ┆ ---   ┆ ---   ┆ ---    ┆ ---   │
│ i64 ┆ i64   ┆ bool  ┆ bool   ┆ bool  │
╞═════╪═══════╪═══════╪════════╪═══════╡
│ 1   ┆ 5     ┆ false > true  -> true  │
│ 2   ┆ 6     ┆ false > false -> false │
│ 2   ┆ 1     ┆ true  ---------> true  │
│ 2   ┆ 2     ┆ true  ---------> true  │
│ 2   ┆ 3     ┆ true  ---------> true  │
│ 3   ┆ 30    ┆ false > false -> false │
│ 3   ┆ 10    ┆ true  ---------> true  │
│ 3   ┆ 20    ┆ true  ---------> true  │
└─────┴───────┴───────┴────────┴───────┘

1つ目についていえば、Expr.is_first_distinct()で「各グループのうち最初に出現する行」を特定することができる。否定なのでTrue/Falseを反転する.not_()を付加する。

2つ目については、行数を求めるpl.count()を、グループごとに適用した結果が1の行を判定する。すなわちpl.count().over("id") == 1で求めることができる。

「1がFalseのもののうち」というのはOR条件のことである。

したがって、以下が最終的な解答となる。

out = df.filter(
    pl.col("id").is_first_distinct().not_()
    | (pl.count().over("id") == 1)
)

print(out)
# shape: (6, 2)
# ┌─────┬───────┐
# │ id  ┆ value │
# │ --- ┆ ---   │
# │ i64 ┆ i64   │
# ╞═════╪═══════╡
# │ 1   ┆ 5     │
# │ 2   ┆ 1     │
# │ 2   ┆ 2     │
# │ 2   ┆ 3     │
# │ 3   ┆ 10    │
# │ 3   ┆ 20    │
# └─────┴───────┘

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up