More than 1 year has passed since last update.

pythonのenumerateを使用して、配列に格納したカラムのデータを一括で可視化する。

Posted at 2022-12-10

pythonのenumerateが、データを一括で可視化する際に見やすいコードを書くのに役に立ったためメモ

使用コード

# [~DISASTER_TWEETS]の~は否定演算子。[DISASTER_TWEETS==False]と同義

METAFEATURES = ['word_count', 'unique_word_count', 'stop_word_count', 'url_count', 'mean_word_length',
                'char_count', 'punctuation_count', 'hashtag_count', 'mention_count']

DISASTER_TWEETS = train_df['target']==1

fig,ax = plt.subplots(ncols=2, nrows=len(METAFEATURES), figsize=(20, 50), dpi=100)

for i, feature in enumerate(METAFEATURES):
    sns.distplot(train_df.loc[~DISASTER_TWEETS][feature], label='Not Disaster', ax=ax[i][0], kde=True, color='#ffb6c1')
    sns.distplot(train_df.loc[DISASTER_TWEETS][feature], label='Disaster', ax=ax[i][0], kde=True, color='#4169e1')
    
    sns.distplot(train_df[feature], label='Training', ax=ax[i][1])
    sns.distplot(test_df[feature], label='Test', ax=ax[i][1])
    
    for j in range(2):
        ax[i][j].set_xlabel('')
        ax[i][j].tick_params(axis='x', labelsize=12)
        ax[i][j].tick_params(axis='y', labelsize=12)
        ax[i][j].legend()
        
    ax[i][0].set_title(f'{feature} Target Distribution in Training Set', fontsize=13)
    ax[i][1].set_title(f'{feature} Training & Test Set Distribution', fontsize=13)
    
plt.show()

使用コードの解説

1)後程使用するデータを変数に格納して用意する

METAFEATURES = ['word_count', 'unique_word_count', 'stop_word_count', 'url_count', 'mean_word_length',
                'char_count', 'punctuation_count', 'hashtag_count', 'mention_count']

DISASTER_TWEETS = train_df['target']==1

METAFEATURESに、データを可視化したいカラムの一覧を配列形式で格納する。
DISASTER_TWEETSに、train_dfのtarget=1という情報を格納する。

※ちなみに、METAFEATURES変数に格納している名前のカラムには、下記の画像のように値が格納されている。

2)enumerateを使用して、配列に格納したカラムのデータをfor文の中のiとfeatureにループで代入していく。

for i, feature in enumerate(METAFEATURES):
    sns.distplot(train_df.loc[~DISASTER_TWEETS][feature], label='Not Disaster', ax=ax[i], kde=True, color='#ffb6c1')
    sns.distplot(train_df.loc[DISASTER_TWEETS][feature], label='Disaster', ax=ax[i], kde=True, color='#4169e1')
        
    ax[i].set_title(f'{feature} Target Distribution in Training Set', fontsize=13)

for i, feature enumerate(METAFEATURES)で,iにインデックス番号、featureに要素をループで代入。
[~DISASTER_TWEETS]の~は、否定演算子で[DISASTER_TWEETS==False]と同じ意味。
└train_df['target']==1ortrain_df['target']==0を表現するために使用。

※イメージ図

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up