More than 1 year has passed since last update.

[pandas] Pandas Dataframe, Seriesの備忘録

Last updated at 2023-03-23Posted at 2022-07-04

以前のNumpyからだいぶ期間が開きましたが、pandas関係の内容についても

公式のdocumentation

User Guide -- pandas 1.4.3 documentation

動作環境

種類	バージョン
MacBook Air	Monterey12.4.
python	3.8.9
jupyter notebook	6.4.10
pandas	1.4.3

まずはパッケージのインポートから

import pandas as pd

pandasを扱うときはpdが慣例だそう

Series、Dataframeの作り方

Series

試しに簡単なリストで例を示す。

data = [1, 2, 3, 4, 5]
se = pd.Series(data)
print(se)
# 0    1
# 1    2
# 2    3
# 3    4
# 4    5
# dtype: int64

リストを使うとこのような形でSeriesを作れる。
ここで、1列目はindex(デフォルトは0, 1, 2, 3, ・・・)、2列目が指定した配列の要素になる。
なお、indexは直接指定することもできる。

data = [1 ,2 ,3 ,4 ,5]
se = pd.Series(data, index=['A', 'B', 'C', 'D', 'E'])
print(se)
# A    1
# B    2
# C    3
# D    4
# E    5
# dtype: int64

DataFrame

試しに簡単なリストで例を示す。

data = [[1, 2, 3], [4, 5, 6]]
df = pd.DataFrame(data)
print(df)
#    0  1  2
# 0  1  2  3
# 1  4  5  6

リストinリストの形の行列でDataFrameは作れる。
縦(0 1)がindex、横(0 1 2)がcolumnと言う。両者ともデフォルトはやはり0開始の整数列で表される。
index, columnを指定するにはいくつか方法がある。
まずはパラメータとして直接指定する書き方。

data = [[1, 2, 3], [4, 5, 6]]
df = pd.DataFrame(data, index=['A', 'B'], columns=['apple', 'orange', 'lemon'])
print(df)
#    apple  orange  lemon
# A      1       2      3
# B      4       5      6

次に、辞書的な指定方法。columnを辞書で指定し、indexは別で指定する。

df = pd.DataFrame({'apple': [1, 4], 
                   'orange': [2, 5], 
                   'lemon': [3, 6]}, index=['A', 'B'])
print(df)
#    apple  orange  lemon
# A      1       2      3
# B      4       5      6

ちなみに逆はできなかった。

df = pd.DataFrame({'A': [1, 2, 3], 
                   'B': [4, 5, 6]}, columns=['apple', 'orange', 'lemon'])
print(df)
# Empty DataFrame
# Columns: [apple, orange, lemon]
# Index: []

参考にしたURL

pandas.DataFrameの行名・列名の変更

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up