3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

Databricks ( Spark ) にてデータフレームのカラム名を一括で変更する方法

Posted at

概要

Databricks ( Spark ) にてデータフレームのカラム名を一括で変更する方法を共有します。

withColumnRenamed 関数によりカラム名を変更できますが、多数のカラムが場合などには次のような関数を利用することがおすすめです。

def rename_df_cols(
    df,
    cols_names,
):
    counter = 0
    for new_col in cols_names:
        counter_str = str(counter)
        old_col_name = df.columns[counter]
        df = df.withColumnRenamed(old_col_name, new_col)
        counter += 1
    return df

image.png

CSV をソースとしたデータフレームを headerFalse とした作成した場合に_# というカラム名になるため、上記の関数が役立ちます。

image.png

実行例

事前準備

import inspect
from pyspark.sql import DataFrame, SparkSession
from pyspark.dbutils import DBUtils
 
def put_files_to_storage(
    path,
    content,
    is_overwrite = True,
):
    # 最初の行と最後の行を削除
    content = inspect.cleandoc(content)
   
    spark = SparkSession.getActiveSession()
    dbutils = DBUtils(spark)
    return dbutils.fs.put(path, content, is_overwrite)

# 検証用ファイルの配置
path = 'dbfs:/test/test.csv'
 
csv_data_rows = """
1,"user_aaa","user_aaa@test.com",53.5,180
2,"user_bbb","user_bbb@test.com",53.5,180
3,"user_ccc","user_ccc@test.com",53.5,180
"""
 
_ = put_files_to_storage(path, csv_data_rows)
 
# ファイルを確認
file_content = dbutils.fs.head(path)
print(file_content)
Wrote 125 bytes.
1,"user_aaa","user_aaa@test.com",53.5,180
2,"user_bbb","user_bbb@test.com",53.5,180
3,"user_ccc","user_ccc@test.com",53.5,180

image.png

# データフレームを作成
df = (
    spark
    .read
    .format('csv')
    .option("inferSchema", "False")
    .option('header', 'False')
    .load(path)
)
df.display()

image.png

1. カラムのリスト型変数により置換する方法

def rename_df_cols(
    df,
    cols_names,
):
    counter = 0
    for new_col in cols_names:
        counter_str = str(counter)
        old_col_name = df.columns[counter]
        df = df.withColumnRenamed(old_col_name, new_col)
        counter += 1
    return df

image.png

col_names = ['id','name','mail','weight','height',]

df_2 = rename_df_cols(df, col_names)
df_2.display()

image.png

2. カラムの辞書型変数により置換する方法

def rename_df_cols_with_col_map(
    df,
    renamed_cols_names,
):
    for existing_col,new_col in renamed_cols_names.items():
        df = df.withColumnRenamed(existing_col, new_col)
    return df

image.png

# ソースファイルにヘッダーがないため、カラム名を変更
renamed_cols_names = {
    '_c0':'id',
    '_c1':'name',
    '_c2':'mail',
    '_c3':'weight',
    '_c4':'height',
}
df_3 = rename_df_cols_with_col_map(df, renamed_cols_names)
 
df_3.display()

image.png

事後処理

dbutils.fs.rm(path, True)

image.png

3
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
1

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?