More than 1 year has passed since last update.

個人用pandas

Last updated at 2023-07-15Posted at 2023-07-15

pandas

補足

qiita用テーブルコードを作るコード

def make_table(lines):
    str = []
    table = "|  |"
    for i in range(len(lines)):
        if i == 0:
            str.append(lines[i].split())

            for j in range(len(str[i])):
                table += " " + str[i][j] + " |"

            table += "\n" + "|"
            
            for j in range(len(str[0]) + 1):
                table += ":-:|"
        else:
            str.append(lines[i].split())
            table = table + "\n" + "|"
            for j in range(len(str[i])):
                table = table + " " + str[i][j] + " |"
    return table

pandasのインポート

通常pdとしてインポートする

import pandas as pd

`Series`オブジェクト

Seriesオブジェクトの作成、index labelはデフォルトでは0から始まる整数値

s = pd.Series([1,2,3,4])
s2 = pd.Series([68, 83, 112, 68], index=["alice", "bob", "charles", "darwin"])
meaning = pd.Series(42, ["life", "universe", "everything"])  #すべて42になる

辞書を用いて作成することもできる

weights = {"alice": 68, "bob": 83, "colin": 86, "darwin": 68}
s3 = pd.Series(weights)
s4 = pd.Series(weights, index = ["colin", "alice"])  #ラベルの指定、順番もこの通りになる
s6 = pd.Series([83, 68], index=["bob", "alice"], name="weights")  #nameを持つこともできる

keysメソッドでキーを出力

s2.keys()

整数値インデックス、またはラベルで呼び出せる

s2[1]
s2["bob"]

ただし、整数値インデックスとラベルのどちらで呼び出すのか明確にするためにilocメソッドやlocメソッドを用いる（integer location, location）

s1.iloc[1]
s1.loc["bob"]
s1.iloc[0:2]
s1.loc[1:]

Seriesと配列の演算が可能

s + [5,6,7,8]

Seriesオブジェクトと１次元スカラーの演算や比較演算子はすべての要素に適応される。これをブロードキャストという。

s + 100
s * 2
s < 4

`DataFrame`オブジェクト

`DataFrame`の作成

DataFrameオブジェクトは様々な方法で作成できる

Seriesの辞書を渡す方法

people_dict = {
    "weight": pd.Series([68, 83, 112], index=["alice", "bob", "charles"]),
    "birthyear": pd.Series([1984, 1985, 1992], index=["bob", "alice", "charles"], name="year"),
    "children": pd.Series([0, 3], index=["charles", "bob"]),
    "hobby": pd.Series(["Biking", "Dancing"], index=["alice", "bob"]),
}
people = pd.DataFrame(people_dict)

	weight	birthday	children	hobby
alice	68	1985	Nan	Biking
bob	83	1984	3.0	Dancing
charls	112	1992	0.0	Nan

行と列を指定することもできる

d2 = pd.DataFrame(
        people_dict,
        columns=["birthyear", "weight", "height"],
        index=["bob", "alice", "eugene"]
     )

	birthyear	weight	height
bob	1984.0	83.0	NaN
alice	1985.0	68.0	NaN
eugene	NaN	NaN	NaN

行列を渡してDataFrameを作成する

import numpy as np
values = [
            [1985, np.nan, "Biking",   68],
            [1984, 3,      "Dancing",  83],
            [1992, 0,      np.nan,    112]
         ]
d3 = pd.DataFrame(
        values,
        columns=["birthyear", "children", "hobby", "weight"],
        index=["alice", "bob", "charles"]
     )

	birthyear	children	hobby	weight
alice	1985	NaN	Biking	68
bob	1984	3.0	Dancing	83
charles	1992	0.0	NaN	112

辞書の辞書を渡す

d4 = pd.DataFrame({
    "birthyear": {"alice": 1985, "bob": 1984, "charles": 1992},
    "hobby": {"alice": "Biking", "bob": "Dancing"},
    "weight": {"alice": 68, "bob": 83, "charles": 112},
    "children": {"bob": 3, "charles": 0}
})

	birthyear	hobby	weight	children
alice	1985	Biking	68	NaN
bob	1984	Dancing	83	3.0
charles	1992	NaN	112	0.0

行や列、行列の呼び出し、検索

列名を指定して列を呼び出す

df1 = people["weitht"]
df2 = people[["weight", "children"]]

行ラベルを指定して行を呼び出す

df = people.loc["charles"]

ilocメソッドで行や列、行列を呼び出せる

d5 = people.iloc[:,2:]

ブール配列で指定する（Trueである行のみが取り出される）

people[np.array([True, False, True])]

これを利用して条件に適合する行だけ取り出すこともできる（ブロードキャスト）

people[people["birthyear"] < 1990]

query()メソッドで検索することもできる

people.query('birthyear<1990')

特定の列以外を取り出す場合はdrop()メソッド

people.drop('people')

列、行の追加

DataFrameがSeries（列）の辞書であるから、辞書のように列を追加することができる

people["age"] = 2018 - people["birthyear"]  # adds a new column "age"
people["over 30"] = people["age"] > 30      # adds another column "over 30"
birthyears = people.pop("birthyear")
del people["children"]

insert()メソッドを用いて指定した位置に挿入することもできる

people.insert(1, "height", [172, 181, 185])

assign()メソッドは新しく列を追加したDataFrameを作成する

people.assign(
    body_mass_index = people["weight"] / (people["height"] / 100) ** 2,
    has_pets = people["pets"] > 0
)

同じ割り当て内で作成した列にアクセスすることはできない。したい場合はlambda関数を用いる

(people
     .assign(body_mass_index = lambda df: df["weight"] / (df["height"] / 100) ** 2)
     .assign(overweight = lambda df: df["body_mass_index"] > 25)
)

その他`DataFrame`の操作

データフレーム（行列）の転置はT属性を使う

df.T

キー（属性名、列名）やインデックス（行名）を配列として受け取る

df.keys()
df.columns
df.index

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

個人用pandas

pandas

pandasのインポート

Seriesオブジェクト

DataFrameオブジェクト

DataFrameの作成