LoginSignup
1
1

More than 1 year has passed since last update.

[python] pandas 条件によるフィルタリング

Posted at

はじめに

最近はpandasの統計をよく使っている
条件式として普段はquery分をよく用いるが、pandas本来の文法の方が機械語ぽっくてちょっと馴染みがある
ここではqueryを使用せず、pandasの条件式をまとめておく

環境

window 10
python 3.7.8
pandas version: 1.3.5

Code

データフレーム

今回は統計の練習で使われるpandasの国別統計のライブラリーを使用する

import pandas as pd
import plotly.express as px
# pip install plotly

df = px.data.gapminder()
print(df)

#           country continent  year  lifeExp       pop   gdpPercap iso_alpha  iso_num
# 0     Afghanistan      Asia  1952   28.801   8425333  779.445314       AFG        4
# 1     Afghanistan      Asia  1957   30.332   9240934  820.853030       AFG        4
# 2     Afghanistan      Asia  1962   31.997  10267083  853.100710       AFG        4
# 3     Afghanistan      Asia  1967   34.020  11537966  836.197138       AFG        4
# 4     Afghanistan      Asia  1972   36.088  13079460  739.981106       AFG        4
# ...           ...       ...   ...      ...       ...         ...       ...      ...
# 1699     Zimbabwe    Africa  1987   62.351   9216418  706.157306       ZWE      716
# 1700     Zimbabwe    Africa  1992   60.377  10704340  693.420786       ZWE      716
# 1701     Zimbabwe    Africa  1997   46.809  11404948  792.449960       ZWE      716
# 1702     Zimbabwe    Africa  2002   39.989  11926563  672.038623       ZWE      716
# 1703     Zimbabwe    Africa  2007   43.487  12311143  469.709298       ZWE      716

条件式

countryAfghanistanであるindexを絞りだした。
条件を満たすindexTrue, 満たさないのはFalseとなる
このindexdf[index]として絞った範囲を値を得ることができる

df["country"] == "Afghanistan"

# 0        True
# 1        True
# 2        True
# 3        True
# 4        True
#         ...
# 1699    False
# 1700    False
# 1701    False
# 1702    False
# 1703    False

df[df["country"] == "Japan"]

#     country continent  year  lifeExp        pop     gdpPercap iso_alpha  iso_num
# 792   Japan      Asia  1952   63.030   86459025   3216.956347       JPN      392
# 793   Japan      Asia  1957   65.500   91563009   4317.694365       JPN      392
# 794   Japan      Asia  1962   68.730   95831757   6576.649461       JPN      392
# 795   Japan      Asia  1967   71.430  100825279   9847.788607       JPN      392
# 796   Japan      Asia  1972   73.420  107188273  14778.786360       JPN      392
# 797   Japan      Asia  1977   75.380  113872473  16610.377010       JPN      392
# 798   Japan      Asia  1982   77.110  118454974  19384.105710       JPN      392
# 799   Japan      Asia  1987   78.670  122091325  22375.941890       JPN      392
# 800   Japan      Asia  1992   79.360  124329269  26824.895110       JPN      392
# 801   Japan      Asia  1997   80.690  125956499  28816.584990       JPN      392
# 802   Japan      Asia  2002   82.000  127065841  28604.591900       JPN      392
# 803   Japan      Asia  2007   82.603  127467972  31656.068060       JPN      392

複数条件

条件式は NOT(~), AND(&), OR(|) があり、優先度としても NOT(~), AND(&), OR(|) の順番となる

country = Japan AND year >= 2000の条件

df[(df["country"] == "Japan") & (df["year"] >= 2000)]
#     country continent  year  lifeExp        pop    gdpPercap iso_alpha  iso_num
# 802   Japan      Asia  2002   82.000  127065841  28604.59190       JPN      392
# 803   Japan      Asia  2007   82.603  127467972  31656.06806       JPN      392

country = Japan AND NOTyear >= 1960の条件

df[(df["country"] == "Japan") & ~(df["year"] >= 1960)]

#     country continent  year  lifeExp       pop    gdpPercap iso_alpha  iso_num
# 792   Japan      Asia  1952    63.03  86459025  3216.956347       JPN      392
# 793   Japan      Asia  1957    65.50  91563009  4317.694365       JPN      392

NOTyear >= 1960 AND country = Japan OR country = United States の条件

演算子の優先度は NOT(~) > AND(&) > OR(|) であるので,
[ { ( NOTyear >= 1960 ) AND country = Japan } OR country = United States ]
のように区切られた。

df[ ~(df["year"] >= 1960) & (df["country"] == "Japan") | (df["country"] == "United States") ]

#             country continent  year  lifeExp        pop     gdpPercap iso_alpha  iso_num
# 792           Japan      Asia  1952   63.030   86459025   3216.956347       JPN      392
# 793           Japan      Asia  1957   65.500   91563009   4317.694365       JPN      392
# 1608  United States  Americas  1952   68.440  157553000  13990.482080       USA      840
# 1609  United States  Americas  1957   69.490  171984000  14847.127120       USA      840
# 1610  United States  Americas  1962   70.210  186538000  16173.145860       USA      840
# 1611  United States  Americas  1967   70.760  198712000  19530.365570       USA      840
# 1612  United States  Americas  1972   71.340  209896000  21806.035940       USA      840
# 1613  United States  Americas  1977   73.380  220239000  24072.632130       USA      840
# 1614  United States  Americas  1982   74.650  232187835  25009.559140       USA      840
# 1615  United States  Americas  1987   75.020  242803533  29884.350410       USA      840
# 1616  United States  Americas  1992   76.090  256894189  32003.932240       USA      840
# 1617  United States  Americas  1997   76.810  272911760  35767.433030       USA      840
# 1618  United States  Americas  2002   77.310  287675526  39097.099550       USA      840
# 1619  United States  Americas  2007   78.242  301139947  42951.653090       USA      840

条件が複数あるときは、予め括弧をつけておくのは適切である

df[ ~(df["year"] >= 1960) & ((df["country"] == "Japan") | (df["country"] == "United States")) ]

#             country continent  year  lifeExp        pop    gdpPercap iso_alpha  iso_num
# 802           Japan      Asia  2002   82.000  127065841  28604.59190       JPN      392
# 803           Japan      Asia  2007   82.603  127467972  31656.06806       JPN      392
# 1618  United States  Americas  2002   77.310  287675526  39097.09955       USA      840
# 1619  United States  Americas  2007   78.242  301139947  42951.65309       USA      840

関連

1
1
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
1