More than 3 years have passed since last update.

【python】pandasの活用～1回目：DataFrameの作成、applyの使い方～

Last updated at 2021-09-02Posted at 2021-08-19

はじめに

バージョン
- Python: 3.6.8
- pandas: 1.1.5
概要
- 以下のようなyamlがある時に、pandasを使うと見やすくかつシンプルに表が作成出来るので紹介します。

member.yml

---
member:
  - name: "Aoki"
    age: 23
  - name: "Akiyama"
    age: 38
  - name: "Tom"
    age: 31

今回紹介するのは以下のパターンです。

1. yamlから表を作成する
1. 表の列に対して関数を適用する
- ageが30以上であるかを判定し、True/Falseを返す
1. 関数の結果を新たな列として追加する

1. サンプルコード

　実際に作成したコードは以下です。

test_pandas.py

import pandas as pd
import yaml


def read_yml(input_file):
    with open(input_file, "r") as yml:
        yml_data = yaml.load(yml, Loader=yaml.SafeLoader)
    return yml_data


def check_age(age):
    return age >= 30


def main():

    print("1. yaml to dataframe")
    file_yaml = read_yml("./member.yml")
    file_df = pd.DataFrame(data=file_yaml["member"])      # point-1
    print(file_df)
    print("")

    print("2. apply function")
    print(file_df["age"].apply(check_age))                # point-2
    print("")

    print("3. merge function result")
    file_df["result"] = file_df["age"].apply(check_age)   # point-3
    print(file_df)


if __name__ == "__main__":
    main()

ポイント
- 1. yamlから表を作成する
  - 以下のように、data=の部分に表にしたいデータを記載します。
    pd.DataFrame(data=表にしたいデータ])
- 1. 列に対して関数を適用する
  - 以下のように、DataFrameに関数を適用するには.applyと記載します。
    DataFrame名[列名].apply(関数名)
- 1. 関数の結果を新たな列として追加する
  - 以下のように、左辺に新たに追加する列を、右辺に追加するDataFrameを記載します。
    DataFrame名[追加する列名] = 追加するDataFrame名

2. 実行結果

実行ログ
- 変数の中身がわかるように適宜printしているので、詳しく見ていきましょう。

(venv) [centos@ip-<ip addr> pandas]$ python3 test_pandas.py 
1. yaml to dataframe
      name  age
0     Aoki   23
1  Akiyama   38
2      Tom   31

2. apply function
0    False
1     True
2     True
Name: age, dtype: bool

3. merge function result
      name  age  result
0     Aoki   23   False
1  Akiyama   38    True
2      Tom   31    True

1. yaml読み取り後
  - 一番左の行番号は自動で割り振ってくれます。

-	name	age
0	Aoki	23
1	Akiyama	38
2	Tom	31

1. 関数適用後
  - 列（age）に関数を適用した結果が返ってくるので、他の列（name）の結果は表示されません。

-	-
0	False
1	True
2	True

1. 関数の結果とのマージ後
  - 関数の結果が列名resultとして追加されています。

-	name	age	result
0	Aoki	23	False
1	Akiyama	38	True
2	Tom	31	True

まとめ

　pandasを使いこなせば、for文を使わずに一気に全てのデータに関数を適用できるので便利だと感じます。

参考記事

Pandas Dataframe のカラムに関数を適用する

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up