なんとなくjupyter notebook使ってみた
kaggleのチュートリアル的として有名な
タイタニック号乗客のデータを使う
環境構築は以下を参考にした
http://qiita.com/mix_dvd/items/29dfb8d47a596b4df36d
必要なライブラリを入れる
import pandas as pd
from pandas import DataFrame,Series
import numpy as np
csv読み込んでデータフレームに突っ込む
titanic_df = pd.read_csv('train.csv')
先頭5行表示
titanic_df.head()
ssengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S |
描画に必要なライブラリを入れる
import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline
性別でカウント
sns.countplot('Sex',data = titanic_df)
16未満であれば、子供 それ以外は性別を返す関数
def male_female_child(passenger):
age, sex = passenger
if age < 16:
return 'child'
else:
return sex
関数の結果をperson列に追加
titanic_df['person'] = titanic_df[['Age','Sex']].apply(male_female_child,axis = 1)
person列が追加されていることを確認
titanic_df.head(10)
ssengerId | Survived | Pclass | Name | Sex | Age | SibSp | Parch | Ticket | Fare | Cabin | Embarked | person | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 0 | 3 | Braund, Mr. Owen Harris | male | 22.0 | 1 | 0 | A/5 21171 | 7.2500 | NaN | S | male |
1 | 2 | 1 | 1 | Cumings, Mrs. John Bradley (Florence Briggs Th... | female | 38.0 | 1 | 0 | PC 17599 | 71.2833 | C85 | C | female |
2 | 3 | 1 | 3 | Heikkinen, Miss. Laina | female | 26.0 | 0 | 0 | STON/O2. 3101282 | 7.9250 | NaN | S | female |
3 | 4 | 1 | 1 | Futrelle, Mrs. Jacques Heath (Lily May Peel) | female | 35.0 | 1 | 0 | 113803 | 53.1000 | C123 | S | female |
4 | 5 | 0 | 3 | Allen, Mr. William Henry | male | 35.0 | 0 | 0 | 373450 | 8.0500 | NaN | S | male |
5 | 6 | 0 | 3 | Moran, Mr. James | male | NaN | 0 | 0 | 330877 | 8.4583 | NaN | Q | male |
6 | 7 | 0 | 1 | McCarthy, Mr. Timothy J | male | 54.0 | 0 | 0 | 17463 | 51.8625 | E46 | S | male |
7 | 8 | 0 | 3 | Palsson, Master. Gosta Leonard | male | 2.0 | 3 | 1 | 349909 | 21.0750 | NaN | S | child |
8 | 9 | 1 | 3 | Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) | female | 27.0 | 0 | 2 | 347742 | 11.1333 | NaN | S | female |
9 | 10 | 1 | 2 | Nasser, Mrs. Nicholas (Adele Achem) | female | 14.0 | 1 | 0 | 237736 | 30.0708 | NaN | C | child |
Pclass(客室のクラス、一等、二等、三等)を層別に描画
sns.countplot('Pclass',data = titanic_df ,hue = 'person')
とりあえず、jupyter使ってみたけど便利
コードと結果が一緒に残せるのがいい
kaggleのタイタニック号乗客の生存分析については次回しっかりやっていこう。