14
11

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

[Python/pandas] DataFrameの欠損値をカウントする

Last updated at Posted at 2018-03-17

前処理を考える手がかりとしてデータの中身をある程度眺める必要があると思いますが、その中でも欠損値の確認はほとんどのケースでやってることだと思います。

ワンライナーでいい感じにやってしまいたかったのでまとめます。

追記 (2018.05.05)

後々調べてみたところ、こちらの方がスッキリしますね。

データ欠損の状況を把握する - Python vs. R

train.isnull().sum()

欠損値のカウントにはこちら↑を使いましょう。

Read csv

データセットはwine-reviewsを使っています。

# Data source:
#   https://www.kaggle.com/zynicide/wine-reviews

df_train = pd.read_csv("winemag-data_first150k.csv", index_col=0)
df_train.head()
country description designation points price province region_1 region_2 variety winery
0 US ... Martha's Vineyard 96 235.0 California Napa Valley Napa Cabernet Sauvignon Heitz
1 Spain ... Carodorum Selección Especial Reserva 96 110.0 Northern Spain Toro Tinta de Toro Bodega Carmen Rodríguez
2 US ... Special Selected Late Harvest 96 90.0 California Knights Valley Sonoma Sauvignon Blanc Macauley
3 US ... Reserve 96 65.0 Oregon Willamette Valley Willamette Valley Pinot Noir Ponzi
4 France ... La Brûlade 95 66.0 Provence Bandol Provence red blend Domaine de la Bégude

※"description" は長い記述の文章なので "..." で置換

欠損値のカウント


df_train.isnull().apply(lambda col: col.value_counts(), axis=0).fillna(0).astype(np.int)
country description designation points price province region_1 region_2 variety winery
False 150925 150930 105195 150930 137235 150925 125870 60953 150930 150930
True 5 0 45735 0 13695 5 25060 89977 0 0

欠損率

df_train.isnull().apply(lambda col: col.value_counts(), axis=0).fillna(0).astype(np.float).apply(lambda col: col/col.sum(), axis=0)
country description designation points price province region_1 region_2 variety winery
False 0.9999668720598953 1.0 0.6969787318624527 1.0 0.9092625720532698 0.9999668720598953 0.8339627641953223 0.4038494666401643 1.0 1.0
True 3.312794010468429e-05 0.0 0.3030212681375472 0.0 0.09073742794673027 3.312794010468429e-05 0.16603723580467766 0.5961505333598357 0.0 0.0

isnull ってDataFrame型でも使えるんですね。Seriesのメソッドだとばかり思っていました。

14
11
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
14
11

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?