Help us understand the problem. What is going on with this article?

[Python/pandas] DataFrameの欠損値をカウントする

More than 1 year has passed since last update.

前処理を考える手がかりとしてデータの中身をある程度眺める必要があると思いますが、その中でも欠損値の確認はほとんどのケースでやってることだと思います。

ワンライナーでいい感じにやってしまいたかったのでまとめます。

追記 (2018.05.05)

後々調べてみたところ、こちらの方がスッキリしますね。

データ欠損の状況を把握する - Python vs. R

train.isnull().sum()

欠損値のカウントにはこちら↑を使いましょう。

Read csv

データセットはwine-reviewsを使っています。

# Data source:
#   https://www.kaggle.com/zynicide/wine-reviews

df_train = pd.read_csv("winemag-data_first150k.csv", index_col=0)
df_train.head()
country description designation points price province region_1 region_2 variety winery
0 US ... Martha's Vineyard 96 235.0 California Napa Valley Napa Cabernet Sauvignon Heitz
1 Spain ... Carodorum Selección Especial Reserva 96 110.0 Northern Spain Toro Tinta de Toro Bodega Carmen Rodríguez
2 US ... Special Selected Late Harvest 96 90.0 California Knights Valley Sonoma Sauvignon Blanc Macauley
3 US ... Reserve 96 65.0 Oregon Willamette Valley Willamette Valley Pinot Noir Ponzi
4 France ... La Brûlade 95 66.0 Provence Bandol Provence red blend Domaine de la Bégude

※"description" は長い記述の文章なので "..." で置換

欠損値のカウント

df_train.isnull().apply(lambda col: col.value_counts(), axis=0).fillna(0).astype(np.int)
country description designation points price province region_1 region_2 variety winery
False 150925 150930 105195 150930 137235 150925 125870 60953 150930 150930
True 5 0 45735 0 13695 5 25060 89977 0 0

欠損率

df_train.isnull().apply(lambda col: col.value_counts(), axis=0).fillna(0).astype(np.float).apply(lambda col: col/col.sum(), axis=0)
country description designation points price province region_1 region_2 variety winery
False 0.9999668720598953 1.0 0.6969787318624527 1.0 0.9092625720532698 0.9999668720598953 0.8339627641953223 0.4038494666401643 1.0 1.0
True 3.312794010468429e-05 0.0 0.3030212681375472 0.0 0.09073742794673027 3.312794010468429e-05 0.16603723580467766 0.5961505333598357 0.0 0.0

isnull ってDataFrame型でも使えるんですね。Seriesのメソッドだとばかり思っていました。

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away