More than 1 year has passed since last update.

pandas;queryメソッドで条件に当てはまるものを抽出

Last updated at 2022-08-10Posted at 2022-08-10

pandasでデータ取り出し

dfの列参照

iloc[10,:]→11番目の列を取ってくる
`loc

dfからqueryで条件に当てはまるものを抽出

正しいコード

df_list = [pd.read_excel('pareto_1.xlsx', index_col=0).query('TM_sum > 0.99'),pd.read_excel('pareto_2.xlsx', index_col=0).query('TM_sum > 0.99'),pd.read_excel('pareto_3.xlsx', index_col=0).query('TM_sum > 0.99'),pd.read_excel('pareto_4.xlsx', index_col=0).query('TM_sum > 0.99')]

間違っていたコード

    df_all = pd.read_excel('pareto_1.xlsx', index_col=0),pd.read_excel('pareto_2.xlsx', index_col=0),pd.read_excel('pareto_3.xlsx', index_col=0),pd.read_excel('pareto_4.xlsx', index_col=0)
    df_list = [df_all.query('TM_sum > 0.99')]

pandasのqueryメソッドで条件に当てはまるものを抽出するときは、dataframeの中じゃないと操作できないのでリストのままでは操作できない
dfの集合(タプル)でも使えないので読み込みの際にqueryを使うしかない
リスト内包表記の場合は[]の中のpd.read_excel()の返り値がdfなのでそのままqueryが使える

queryを複数のファイルに適用する

正しいコード

df_list = [pd.read_excel(f'pareto_{i}.xlsx', index_col=0).astype(float).query("TM_sum > 0.95") for i in range(1, 5)]

間違ったコード

df_list = [pd.read_excel('pareto_1.xlsx', index_col=0).query('TM_sum > 0.99'),pd.read_excel('pareto_2.xlsx', index_col=0).query('TM_sum > 0.99'),pd.read_excel('pareto_3.xlsx', index_col=0).query('TM_sum > 0.99'),pd.read_excel('pareto_4.xlsx', index_col=0).query('TM_sum > 0.99')]

比較演算子はstring形には使えないので、.astype(float)を使ってデータ型を変換する。forループの書き方覚える。range(1, 5)は1,2,3,4を示す。0から始まることに注意。ファイル名をそのまま打つよりこっちの方が簡潔

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up