1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

pandasで文字列からなる列を操作する

Posted at

下書きに眠っていたので供養

スタックオーバーフローの下記記事をまとめました。
Convert categorical data in pandas dataframe - Stack Overflow

カテゴリカルにせず、文字列の列を整数列に変換する場合

In [1]: pd.factorize( ['B', 'C', 'D', 'B'] )[0]
Out[1]: array([0, 1, 2, 0])

カテゴリカル変数として扱う場合

In [1]: df = pd.DataFrame(
                {'col1':[1,2,3,4,5],
                 'col2':list('abcab'), # str
                 'col3':list('ababb')  # str
                })
In [2]: df.columns
Out[2]: Index(['col1', 'col2', 'col3'], dtype='object')

この時点で全ての列はobjectになっているので、categoryにcastする

In [3]: df['col2'] = df['col2'].astype('category')
In [4]: df['col3'] = df['col3'].astype('category')

select_dtypes(['category'])でカテゴリカル変数になっている列を取得できるので、

In [5]: cat_columns = train.select_dtypes(['category']).columns
In [6]: cat_columns
Out[6]: Index(['col2', 'col3'], dtype='object')

となる。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?