【Pandas】これだけ覚えたいPython Pandas DataFrame操作

Last updated at 2024-06-06Posted at 2023-02-18

はじめに

DataFrameがむずすぎるので取り合えず最低限これだけあればそこそこ操作はできるようになるんじゃないかな？レベルのものをまとめます。(あまりにも自分用ですが、、、)

結論

インポート
df = pd.read_csv(filepath, index_col=0)
要素の取得/操作
df.loc["indexname i":"indexname j", "colname k":"colname l"]
df.iloc[i:j, k:l]
要素の条件抽出
df_condition = df.loc[:, "a"] > 1
df = df[df_condition]
index,column名の取得/操作
df.index df.columns
index追加
df = pd.concat([df, df_add])
column追加
df["newcol"] = listlike
index,columnの削除
df = df.drop(index=["index_name"], columns=["col_name"])
NumPy変換
df.to_numpy(dtype="float")
エクスポート
df.to_csv(filepath)
df.to_clipboard()

とりあえずPandasをimportします。

import pandas as pd

データを取り込む

なにはともあれまずはDataFrameを用意します。
普段はdf = pd.read_csv(filepath,index_col = 0)でデータを読むことが多いです。

2次元のリストを用意しpd.DataFrame()でDataFrame化。listはnp.ndarrayでもOK
index名（行）をu,v,w、column名（列）をa,b,cとしています。

list = [[1,2,3],
        [4,5,6],
        [7,8,9]]
df = pd.DataFrame(list,
                  index=["u","v","w"],
                  columns=["a","b","c"])
print(df)
#    a  b  c
# u  1  2  3
# v  4  5  6
# w  7  8  9

要素の扱い

df.loc[]またはdf.iloc[]を使用する。
index名, column名で指定するか、index番号, column番号で指定するかの違い。

	i行 j列	i～j行・k～l列	すべて
書き方	`[i, j]`	`[i:j, k:l]`	`[:, :]`

print(df.loc["u", "a":"b"])
# a    1
# b    2
# Name: u, dtype: int64

print(df.iloc[:, 0:2])
#    a  b
# u  1  2
# v  4  5
# w  7  8

loc[]はi:jと書くとi以上jまでの要素が取り出されるが、iloc[]の場合はi以上j"未満"の要素が取り出される。

要素の条件抽出

df_condition = df.loc[:, "a"] > 1
df = df[df_condition]
print(df)
#    a  b  c
# v  4  5  6
# w  7  8  9

index名,column名の扱い

df.indexやdf.columnsを使用、listライクなのものを代入すれば名前の変更が可能。

print(df.index)
# Index(['u', 'v', 'w'], dtype='object')

print(df.columns)
# Index(['a', 'b', 'c'], dtype='object')

df.columns = ["x", "y", "z"]
print(df)
#    x  y  z
# u  1  2  3
# v  4  5  6
# w  7  8  9

indexやcolumnを新たに追加する

index側の追加はDataFrame同士をpd.concat([df1, df2, ...])でつなげる。

# 追加するDataFrameを定義
add_list = [[100, 200, 300]]
df_add = pd.DataFrame(add_list,
                      index=["t"],
                      columns=["a", "b", "c"])
print(df_add)
#      a    b    c
# t  100  200  300

df = pd.concat([df, df_add])
print(df)
#      a    b    c
# u    1    2    3
# v    4    5    6
# w    7    8    9
# t  100  200  300

column側の追加は存在しないcolumn名を指定しlistライクなものを代入すればOK
存在していた場合は上書きされます。

df["t"] = [100, 200, 300]
print(df)
#    a  b  c    t
# u  1  2  3  100
# v  4  5  6  200
# w  7  8  9  300

特定のindex,columnを削除

df = df.drop(index=[], columns=[])をつかう。strのリストを突っ込む。

print(df.drop(index=["u"], columns=["a", "b"]))
#    c
# v  6
# w  9

NumPy.ndarrayへの変換

df.to_numpy()を使用する。NumPy.ndarrayは後から型を変えられないのでdtypeを指定しておくほうがいいと思う。

arr = df.to_numpy(dtype="float")
print(arr)
# [[1. 2. 3.]
#  [4. 5. 6.]
#  [7. 8. 9.]]

データの出力

df.to_csv()とかdf.to_clipboard()をよく使う。

最後に

とりあえずこれができればちゃんとしたデータの操作や整形できると思います。
ちゃんとしてないデータの場合はGoogle先生に教わりながらやりましょう。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up