この記事はかめ(@usdatascientist)さんのブログ(https://datawokagaku.com/python_for_ds_summary/) に書かれているPandasの基本操作を実際にJupyter Labを用いてコーディングしてみた、という記事です。
Pandasの基本操作まとめ
第10回
import pandas as pd
import numpy as np
Series
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
name John
sex male
age 22
dtype: object
array = np.array([10,20,30])
pd.Series(array)
0 10
1 20
2 30
dtype: int64
array = np.array([10,20,30])
labels = ['a','b','c']
pd.Series(array, labels)
a 10
b 20
c 30
dtype: int64
第11回
DataFrameの作り方
ndarrayから作る
data = {'name':'John', 'sex':'male', 'age': 22}
john_s = pd.Series(data)
print(john_s)
print(john_s['age'])
name John
sex male
age 22
dtype: object
22
ndarray = np.random.randint(5, size=(5,4))
pd.DataFrame(data=ndarray)
0 | 1 | 2 | 3 | |
---|---|---|---|---|
0 | 1 | 1 | 1 | 0 |
1 | 4 | 1 | 0 | 0 |
2 | 3 | 2 | 1 | 0 |
3 | 3 | 1 | 1 | 3 |
4 | 4 | 0 | 1 | 3 |
columns = ['a','b','c','d']
index = np.arange(0,50,10)
pd.DataFrame(data=ndarray, index=index, columns=columns)
a | b | c | d | |
---|---|---|---|---|
0 | 1 | 1 | 1 | 0 |
10 | 4 | 1 | 0 | 0 |
20 | 3 | 2 | 1 | 0 |
30 | 3 | 1 | 1 | 3 |
40 | 4 | 0 | 1 | 3 |
dictionaryから作る
data1 = {
'name':'John',
'sex':'male',
'age':22
}
data2 = {
'name':'Zack',
'sex':'male',
'age':30
}
data3 ={
'name':'Emily',
'sex':'female',
'age':32
}
pd.DataFrame([data1, data2, data3])
name | sex | age | |
---|---|---|---|
0 | John | male | 22 |
1 | Zack | male | 30 |
2 | Emily | female | 32 |
df = pd.read_csv('train.csv')
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
第12回
.head()で最初の5行を表示
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
.describe()で統計量を確認
df.describe()
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
type(df.describe()) #typeはDataFrame
pandas.core.frame.DataFrame
.columnsでカラムのリストを表示
df.columns
Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',
'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')
type(df.columns) #typeはindex
pandas.core.indexes.base.Index
df.index #indexもある.
RangeIndex(start=0, stop=891, step=1)
ブラケット[]で特定のカラムを抱け抜き出したSeriesを取得する。
df['Age'].head()
0 22.0
1 38.0
2 26.0
3 35.0
4 35.0
Name: Age, dtype: float64
type(df['Age'])
pandas.core.series.Series
ブラケット[]にカラムのリストを入れて複数のカラムをまとめて抽出する
df[['Age','Parch','Fare']].head()
Age | Parch | Fare | |
---|---|---|---|
0 | 22.0 | 0 | 7.2500 |
1 | 38.0 | 0 | 71.2833 |
2 | 26.0 | 0 | 7.9250 |
3 | 35.0 | 0 | 53.1000 |
4 | 35.0 | 0 | 8.0500 |
.iloc[int]で特定の行をSeriesで取得する
df.iloc[888] #index location
PassengerId 889
Survived 0
Pclass 3
Name Johnston, Miss. Catherine Helen "Carrie"
Sex female
Age NaN
SibSp 1
Parch 2
Ticket W./C. 6607
Fare 23.45
Cabin NaN
Embarked S
Name: 888, dtype: object
df.iloc[888]['Age']
nan
np.isnan(df.iloc[888]['Age'])
True
np.random.seed(1)
ndarray = np.random.randint(10, size=(5,5))
columns = [0,1,2,3,4]
index = ['a','b','c','d','e']
df_1 = pd.DataFrame(data=ndarray, index=index, columns=columns)
df_1
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
a | 5 | 8 | 9 | 5 | 0 |
b | 0 | 1 | 7 | 6 | 9 |
c | 2 | 4 | 5 | 2 | 4 |
d | 2 | 4 | 7 | 7 | 9 |
e | 1 | 7 | 0 | 6 | 9 |
df_1[0]
a 5
b 0
c 2
d 2
e 1
Name: 0, dtype: int64
df_1.loc['c'] #行がintではない時は['str']にする。
0 2
1 4
2 5
3 2
4 4
Name: c, dtype: int64
Slicingで特定の行、列を落とす
index=0(0列目)を落とす
df.drop(0) .head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
'Age'のカラムを落とす
df.drop('Age', axis=1) .head()
PassengerId | Survived | Pclass | Name | Sex | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 0 | 0 | 373450 | 8.0500 | NaN | S |
複数のカラムを落とす時は引数にリストを渡す .drop([]). dropしても元のdfは変更されない
df.drop(['Age','PassengerId'], axis=1) .head()
Survived | Pclass | Name | Sex | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 3 | Braund, Mr. Owen Harris | male | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 1 | 3 | Heikkinen, Miss. Laina | female | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 0 | 3 | Allen, Mr. William Henry | male | 0 | 0 | 373450 | 8.0500 | NaN | S |
df.head()#dropしても元のdfは変更されない
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
dfを上書きする方法2通りある.inplace=Trueにすると元のDataFrameが変更される
df = pd.read_csv('train.csv')
df.drop(['Age', 'Cabin'], axis=1, inplace=True)
df .head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
df = pd.read_csv('train.csv')
df = df.drop(['Age', 'Cabin'], axis=1)
id(df)
140285150057616
slicingで複数行を取得する
df.iloc[5:10]
PassengerId | Survived | Pclass | Name | Sex | SibSp | Parch | Ticket | Fare | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|
5 | 6 | 0 | 3 | Moran, Mr. James | male | 0 | 0 | 330877 | 8.4583 | Q |
6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 0 | 0 | 17463 | 51.8625 | S |
7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 3 | 1 | 349909 | 21.0750 | S |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 0 | 2 | 347742 | 11.1333 | S |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 1 | 0 | 237736 | 30.0708 | C |
第13回
DataFrameを特定の条件でフィルタ(filter)する
df = pd.read_csv('train.csv')
df = df['Survived'] == 1#生存者をfilterする
df.head()
0 False
1 True
2 True
3 True
4 False
Name: Survived, dtype: bool
filter = df['Survived'] ==1 #filterという変数に入れる
df = df[filter]
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C |
df = df[df['Survived'] ==1] #こちらの方が一般的
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C |
df[df['Survived'] ==1].describe() #生存者のデータのみをdescribe
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 342.000000 | 342.0 | 342.000000 | 290.000000 | 342.000000 | 342.000000 | 342.000000 |
mean | 444.368421 | 1.0 | 1.950292 | 28.343690 | 0.473684 | 0.464912 | 48.395408 |
std | 252.358840 | 0.0 | 0.863321 | 14.950952 | 0.708688 | 0.771712 | 66.596998 |
min | 2.000000 | 1.0 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 250.750000 | 1.0 | 1.000000 | 19.000000 | 0.000000 | 0.000000 | 12.475000 |
50% | 439.500000 | 1.0 | 2.000000 | 28.000000 | 0.000000 | 0.000000 | 26.000000 |
75% | 651.500000 | 1.0 | 3.000000 | 36.000000 | 1.000000 | 1.000000 | 57.000000 |
max | 890.000000 | 1.0 | 3.000000 | 80.000000 | 4.000000 | 5.000000 | 512.329200 |
df.describe() #元データ
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 891.000000 | 891.000000 | 891.000000 | 714.000000 | 891.000000 | 891.000000 | 891.000000 |
mean | 446.000000 | 0.383838 | 2.308642 | 29.699118 | 0.523008 | 0.381594 | 32.204208 |
std | 257.353842 | 0.486592 | 0.836071 | 14.526497 | 1.102743 | 0.806057 | 49.693429 |
min | 1.000000 | 0.000000 | 1.000000 | 0.420000 | 0.000000 | 0.000000 | 0.000000 |
25% | 223.500000 | 0.000000 | 2.000000 | 20.125000 | 0.000000 | 0.000000 | 7.910400 |
50% | 446.000000 | 0.000000 | 3.000000 | 28.000000 | 0.000000 | 0.000000 | 14.454200 |
75% | 668.500000 | 1.000000 | 3.000000 | 38.000000 | 1.000000 | 0.000000 | 31.000000 |
max | 891.000000 | 1.000000 | 3.000000 | 80.000000 | 8.000000 | 6.000000 | 512.329200 |
df[df['Age'] >= 60].describe() #'Age'>=60のみ
PassengerId | Survived | Pclass | Age | SibSp | Parch | Fare | |
---|---|---|---|---|---|---|---|
count | 26.000000 | 26.000000 | 26.000000 | 26.000000 | 26.000000 | 26.000000 | 26.000000 |
mean | 455.807692 | 0.269231 | 1.538462 | 65.096154 | 0.230769 | 0.307692 | 43.467950 |
std | 240.078490 | 0.452344 | 0.811456 | 5.110811 | 0.429669 | 0.837579 | 51.269998 |
min | 34.000000 | 0.000000 | 1.000000 | 60.000000 | 0.000000 | 0.000000 | 6.237500 |
25% | 277.250000 | 0.000000 | 1.000000 | 61.250000 | 0.000000 | 0.000000 | 10.500000 |
50% | 489.000000 | 0.000000 | 1.000000 | 63.500000 | 0.000000 | 0.000000 | 28.275000 |
75% | 629.750000 | 0.750000 | 2.000000 | 69.000000 | 0.000000 | 0.000000 | 58.860450 |
max | 852.000000 | 1.000000 | 3.000000 | 80.000000 | 1.000000 | 4.000000 | 263.000000 |
df[(df['Age']>=60) & (df['Sex']=='female')] #60才以上かつ女性のみのデータ
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
275 | 276 | 1 | 1 | Andrews, Miss. Kornelia Theodosia | female | 63.0 | 1 | 0 | 13502 | 77.9583 | D7 | S |
366 | 367 | 1 | 1 | Warren, Mrs. Frank Manley (Anna Sophia Atkinson) | female | 60.0 | 1 | 0 | 110813 | 75.2500 | D37 | C |
483 | 484 | 1 | 3 | Turkula, Mrs. (Hedwig) | female | 63.0 | 0 | 0 | 4134 | 9.5875 | NaN | S |
829 | 830 | 1 | 1 | Stone, Mrs. George Nelson (Martha Evelyn) | female | 62.0 | 0 | 0 | 113572 | 80.0000 | B28 | NaN |
df[(df['Pclass']==1) | (df['Age']<10)] #1stclassもしくは10才未満のみのデータ
df.head()
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C |
~ (スクィグル)をつけるとNOT演算でフィルタ可能
data =[{'Name':'John', 'Survived':True},
{'Name':'Emily', 'Survived':False},
{'Name':'Ben', 'Survived':True}]
df = pd.DataFrame(data)
df
Name | Survived | |
---|---|---|
0 | John | True |
1 | Emily | False |
2 | Ben | True |
値がbooleanのカラムでフィルタする時によく使います.
df[df['Survived']==True]
Name | Survived | |
---|---|---|
0 | John | True |
2 | Ben | True |
SurvivedカラムはすでにBooleanなので,==True必要ないです. df[‘Survived’]がすでにBooleanのSeriesになるので左のようにそのままフィルタできます.
df[df['Survived']]
Name | Survived | |
---|---|---|
0 | John | True |
2 | Ben | True |
Survived==Falseに絞りたい場合は, df[df['Survived'==False] なんてことする必要なく,以下のようにできます
df[~df['Survived']]
Name | Survived | |
---|---|---|
1 | Emily | False |
indexを変更する
.reset_index()で再度indexを割り振る
df = pd.read_csv('train.csv')
df = df[df['Sex']=='male']
df.head() #indexがバラバラ
PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S |
7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |
indexを揃える
.drop() 同様,もとの df は上書きされないので, df を更新したい場合は inplace=True もしくは df = df.reset_index() で再代入しましょう.
df.reset_index() .head()
index | PassengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
2 | 5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
3 | 6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S |
4 | 7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |
.set_index()で特定のカラムをindexにする
indexを’Name’にする
.reset_index() 同様, inplace=True でもとのdfを上書けます.
df.set_index('Name').head()
PassengerId | Survived | Pclass | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|
Name | |||||||||||
Braund, Mr. Owen Harris | 1 | 0 | 3 | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
Allen, Mr. William Henry | 5 | 0 | 3 | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
Moran, Mr. James | 6 | 0 | 3 | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q |
McCarthy, Mr. Timothy J | 7 | 0 | 1 | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S |
Palsson, Master. Gosta Leonard | 8 | 0 | 3 | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S |