0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

氏名から性別を判定する。

Last updated at Posted at 2020-04-28
pip install nltk
import nltk
import random
from nltk import classify
from nltk import NaiveBayesClassifier as NBC

データ

男性の名前が入ったリスト(malelist)と女性の名前が入ったリスト(femalelist)を用意する。(ローマ字)(サンプル数は揃える)(姓名のうち名のみ)

例: malelist=['kazuo','kenji',・・・・]

name.py
def feature_extraction(word):
    return {"last":word[-3:]}
#名前のうち後ろ3文字のみを取り出す関数

maleNames=[(name, "male")for name in malelist]
femaleNames = [(name, 'female')for name in femalelist]
allNames = maleNames + femaleNames #男性のリストと女性のリストをくっつける
random.shuffle(allNames) #リストの中身をシャッフル

featureData=[(feature_extraction(n),gender) for (n,gender) in allNames]
#[(後ろ三文字,性別),(後ろ三文字,性別),(後ろ三文字,性別)・・・]の形にする

genderIdentifier=NBC.train(featureData)

#精度を確かめたい場合はしたのような感じで、、
#num=7*len(featureData)//10 データの
#train_data=featureData[num:]
#test_data=featureData[:num]
#genderIdentifier=NBC.train(train_data)

###作ったモデルを保存する方法

import pickle
f = open('my_classifier.pickle', 'wb')
pickle.dump(genderIdentifier, f)
f.close()

###保存したモデルをロードする方法

import pickle
f = open('my_classifier.pickle', 'rb')
classifier = pickle.load(f)
f.close()

参考

Save Naive Bayes Trained Classifier in NLTK

Machine Learning Model - Gender Identifier with NLTK in less than 15 lines of code

Machine Learning Model - Gender Identifier with NLTK in less than 15 lines of code

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?