2
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

[WIP] Vectorizerを使いこなすために

Posted at

TfidfVectorizer

代表的なパラメータ

  • strip_accents : {‘ascii’, ‘unicode’, None}
    • ascii/unicodeに含まれないアクセント文字の置き換え
  • tokenizer : callable or None (default)
    • トークナイザの関数を渡す
    • 文字列を引数として、文字列のリストを返す
  • ngram_range : tuple (min_n, max_n)
    • 文字分割の方法として、N-gramを設定する
    • nの下限、上限を設定できる
  • max_df : float in range [0.0, 1.0] or int, default=1.0
    • 除去したい頻出単語の割合
  • min_df : float in range [0.0, 1.0] or int, default=1
    • 除去したい出現頻度の低い単語の割合
  • use_idf : boolean, default=True
    • 出現頻度の逆数で重み付けするかの判断
  • smooth_idf : boolean, default=True
    • Falseにするとidfの分母の代わりにidfに "1"が追加されます。
  • sublinear_tf : boolean, default=False
    • スケール(対数スケールなど)をとるか

使い方


vectorizer = TfidfVectorizer(...)
fitted = vectorizer.fit(train['feature'])
counted = fitted.transform(train['feature'])
2
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
3

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?