0
0

More than 3 years have passed since last update.

Pyspark 文字列置き換え

Last updated at Posted at 2021-08-11

内容

|    1|  sample1|sample_1|2021/07/14|  A|    null|       null|       null|       11:22:33|
|    2|  sample2|sample_2|2021/07/15|  B|    null|       null|       null|       11時間22分33秒|

1が正常系、2が異常
11:22:33としたいが11時間22分33秒となってしまっているので置き換えたい

やること

from pyspark.sql.types import *
from pyspark.sql import functions as F

def time_format_method(x):
    x = x.replace("時間", ":").replace("分", ":").replace("秒","")
    return x
time_format = F.udf(lambda z: time_format_method(z), StringType())

df_format = df_input \
    .select(
        F.col("--"),
        F.col("--"),
        F.col("--"),
        F.col("--"),
        F.col("--"),
        F.col("--"),
        format(F.col("修正したいカラム名")).alias("付けたい名前")
    )

参考

How to Turn Python Functions into PySpark Functions (UDF)
pyspark.sql.functions.udf

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0