More than 5 years have passed since last update.

【Python】pandasのSeries、DataFrameとは

Last updated at 2016-01-23Posted at 2016-01-23

pandasのSeries、DataFrame

ITエンジニアのための機械学習理論入門をやり始め、そっこーで
DataFrameが何を意味しているか分からず、嵌りました。
調べた結果です。ついでにSeriesも

pandas 0.17.1 documentationの公式ドキュメントより

The two primary data structures of pandas, Series (1-dimensional) and DataFrame (2-dimensional), handle the vast majority of typical use cases in finance, statistics, social science, and many areas of engineering. For R users, DataFrame provides everything that R’s data.frame provides and much more.

簡易訳
pandasは二つの主要なデータ構造がある。
Seriesが1次元
DataFrameが2次元
これらは様々な分野(金融、統計...)で使われる
R使用者の観点からだと、DataFrameはRのdata.frameが提供する以上のものを提供する。

使い方

# 数値計算ライブラリインポート
import numpy
# データ分析ライブラリからSeriesとDataFrameをインポート
from pandas import Series, DataFrame

# Series
# data仮引数  : データ。array-like, dict, or scalar value
# index仮引数 : データの添え字。array-like or Index (1d)
# dtype仮引数 : データタイプ。numpy.dtype or None
# copy仮引数  : コピー。デフォルトはfalse
# name仮引数  : 結果につける名前
# 1
print(Series(data=[0,1]))
# 2
print(Series(data=[2,3], index=['x', 'y'], name='value'))

# DataFrame
# data仮引数    : データ ( numpy ndarray (structured or homogeneous), dict, or DataFrame)
# index仮引数   : 要素のインデックス。デフォルトは添え字配列みたいに数字
# columns仮引数 : 2次元のインデックス。デフォルトは数字
# dtype仮引数   : データタイプ。dtype, default None
# copy仮引数    : コピー。デフォルトはfalse。
# 3
print(DataFrame(numpy.array([[0,0],[1,1]])))
# 4
print(DataFrame(numpy.array([[0,0],[1,1]]), index=['a', 'b']))
# 5
print(DataFrame(numpy.array([[0,0],[1,1]]), index=['a', 'b'], columns=['x', 'y']))

結果

# 1
0    0
1    1
dtype: int64

# 2
x    2
y    3
Name: value, dtype: int64

# 3
   0  1
0  0  0
1  1  1

# 4
   0  1
a  0  0
b  1  1

# 5
   x  y
a  0  0
b  1  1

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up