4
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

PysparkのDataFrameで横持ちデータを縦持ちに変換する

Posted at

はじめに

PysparkのDataFrameで縦→横の変換は簡単だけど、逆は結構難しいので備忘のために記載。

解決策

PySQLのbuldin関数のstackを使う。
引数は、stack(行数, クエリ1, クエリ2, ...)
stackで指定した行数分、後続のクエリで指定した列を集計して、行数を膨らませるような処理を行う。

このstack関数は、pysparkのfunctionでは提供されていない?ようなので、selectExprでSQLのselectクエリを記述して実行する。

サンプル

横持ち→縦持ち
from pyspark.sql import functions as F

# 横持ちで各列のNull値の件数を集計
null_count_pivot = df.select([F.count(F.when(F.col(c).isNull(), c).otherwise(None)) for c in df.columns])

# 横持ちを縦持ちに変換
null_count_unpivot = (
  null_count_pivot
  .selectExpr(
    f"""
    stack(
      {len(df.columns)},
      {', '.join(f'"{c}", {c}' for c in df.columns)}
    ) as (column, null_count)
    """
  )
)

参考

4
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?