0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

Pythonがいくら便利だからってあんまり雑に使っていると痛い目を見る~pandas編~

Posted at

 Pythonは普通に書いてるだけで勝手に読みやすくなる文法に,型キャストや引数もいろいろ良しなに扱ってくれる便利な言語ですが,だからといって適当に使っていると変なところでハマるよというお話.

この記事で扱うデータ

sample.csv
Height,Weight,Gender
161.6124725,65.98017883,Female
150.4265289,59.49276352,Female
172.5548706,82.67674255,Male
164.2229767,67.56816864,Female
185.3570404,99.14295197,Male
178.6737518,80.50052643,Male
184.1519165,91.4691925,Male
169.6814117,68.50371552,Female
171.8485107,87.14485931,Male

以上のようなcsvファイルを例に扱います.こいつがなかなか曲者なんだ...

ケース1:DataFrameの行抽出

 DataFrameは条件を指定して行を抽出できます.しかし...

input
df1[df1.Gender == 'Male']
output
	Height	Weight	Gender
0	NaN	NaN	NaN
1	NaN	NaN	NaN
2	NaN	NaN	Male
3	NaN	NaN	NaN
4	NaN	NaN	Male
...	...	...	...
995	NaN	NaN	Male
996	NaN	NaN	NaN
997	NaN	NaN	NaN
998	NaN	NaN	NaN
999	NaN	NaN	Male
1000 rows × 3 columns

 ...なんじゃこりゃ.よく調べてみると,

input
type(df1['Height'])
output
pandas.core.frame.DataFrame

本来であれば,これはSeries型になるはずです.参考

解決策

 DataFrameを作り直すと正常に動作しました.

input
ans = pd.DataFrame({'Height': df1['Height'].values.reshape(1000),
                    'Weight': df1['Weight'].values.reshape(1000),
                    'Gender': df1['Gender'].values.reshape(1000)
                   })
ans[ans['Gender'] == 'Male']
output
	Height	Weight	Gender
2	172.554871	82.676743	Male
4	185.357040	99.142952	Male
5	178.673752	80.500526	Male
6	184.151916	91.469193	Male
8	171.848511	87.144859	Male
...	...	...	...
989	165.647507	79.347435	Male
992	170.756546	82.422096	Male
994	180.842758	95.315422	Male
995	183.990356	95.574455	Male
999	186.798447	93.485558	Male
514 rows × 3 columns

ケース2:Seabornでのプロット

 Seabornでのエラー.

input
sns.scatterplot(data=df1, x='Height', y='Weight')
plt.show()
output
----> 1 sns.scatterplot(data=df1, x='Height', y='Weight')
      2 plt.show()
...
ValueError: If using all scalar values, you must pass an index

ドキュメントにも載っているシンプルな書き方ですが,変なエラーが出ました.どうやら辞書型からDataFrameに変換するときにインデックス周りで出るエラーのようで,実際エラー文の途中にそのような処理をしている箇所が書かれていました.

 解決策

input
plt.scatter(df1.Height.values, df1.Weight.values)
plt.show()

ダウンロード.png

valuesメソッドでndarray型にしてやれば大丈夫.でもラベルをサクッと貼れないのは不便...

まとめ

 一時間強ハマってしまいました.自分が調べた限りでは根本的な解決策は見つかりませんでしたが,時間がある時にでもissueを漁ってみようかと思います.

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?