Python
python3
Jupyter

なんとなくjupyter notebook使ってみた


なんとなくjupyter notebook使ってみた


kaggleのチュートリアル的として有名な
タイタニック号乗客のデータを使う

環境構築は以下を参考にした
http://qiita.com/mix_dvd/items/29dfb8d47a596b4df36d

必要なライブラリを入れる

import pandas as pd
from pandas import DataFrame,Series
import numpy as np

csv読み込んでデータフレームに突っ込む

titanic_df = pd.read_csv('train.csv')

先頭5行表示

titanic_df.head()
.dataframe thead tr:only-child th { text-align: right; }
ssengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S

描画に必要なライブラリを入れる

import matplotlib.pyplot as plt
import seaborn as sns
% matplotlib inline

性別でカウント

sns.countplot('Sex',data = titanic_df)

output_10_2.png

16未満であれば、子供 それ以外は性別を返す関数

def male_female_child(passenger):
    age, sex  = passenger
    if age < 16:
        return 'child'
    else:
        return sex  

関数の結果をperson列に追加

titanic_df['person'] = titanic_df[['Age','Sex']].apply(male_female_child,axis = 1)

person列が追加されていることを確認

titanic_df.head(10)
ssengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked person
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S male
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 0 PC 17599 71.2833 C85 C female
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S female
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S female
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S male
5 6 0 3 Moran, Mr. James male NaN 0 0 330877 8.4583 NaN Q male
6 7 0 1 McCarthy, Mr. Timothy J male 54.0 0 0 17463 51.8625 E46 S male
7 8 0 3 Palsson, Master. Gosta Leonard male 2.0 3 1 349909 21.0750 NaN S child
8 9 1 3 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 2 347742 11.1333 NaN S female
9 10 1 2 Nasser, Mrs. Nicholas (Adele Achem) female 14.0 1 0 237736 30.0708 NaN C child

 Pclass(客室のクラス、一等、二等、三等)を層別に描画

sns.countplot('Pclass',data = titanic_df ,hue = 'person')

output_17_2.png

とりあえず、jupyter使ってみたけど便利
コードと結果が一緒に残せるのがいい

kaggleのタイタニック号乗客の生存分析については次回しっかりやっていこう。

table{ width:50px }