
More than 5 years have passed since last update.

[機械学習で推奨タグ付け #4] 機械学習のスクリプト・・・?

Posted at


Hi, I hope you're doing well.
I'm so sleepy... because gym activities in morning time. But I'd like to resume my process ... with drink :stuck_out_tongue_closed_eyes: yahoo!

So today's topic is finally ... machine learning! we already got necessary elements for learning ant test, so only what I have to do, train my machine!
Start ... but I have to say one thing before starting.

I can't do coding of Machine Learning...!

Really sorry, oh, stop!! don't through a stone in you hand ... yep, light. I don't make it actually I can't.
Instead, I'd like to use script from another site. And I think you know it. Here.
機械学習 はじめよう 第3回 ベイジアンフィルタを実装してみよう - gihyo.jp
This is very good site for learning Machine Learning as entrance. I really recommend it.

So today, that's call it for today... ? Humm. Actually I have to change some points to apply to my purpose. I'd like to show some change how I can change it. Nothing of machine learning today...

def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)

This is train function: Got words in doc then cat value amounts are counted up for the words. However this is only for one category by one web content. However there are two or upper category will also be tagged for one web content. So I changed the script like this.

def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)

Use cats value as list. Not single string. using for to count up each category by words.

Next is to modifying the result showing. Original script is like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best

This function returns the best category name. However I'd like to show all category and probability. So I modified like this.

def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []

    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)

Previous code, just return maximum probably tag. But I'd like to know all tag's result. so return the list.

The engine of machine language is just using other person's idea... Next I'd like to show you the result of the machine learning and consideration.






いや、やめて、石投げないで! ・・・ そうなんです。やりません。というかできません。代わりに機械学習以下のサイトのネイティブベイズのサンプルコードを流用いたします。

機械学習 はじめよう 第3回 ベイジアンフィルタを実装してみよう - gihyo.jp



def train(self, doc, cat):
word = getwords(doc)
for w in word:
    self.wordcountup(w, cat)


def train(self, doc, cats):
word = getwords(doc)
for w in word:
    for cat in cats:
        self.wordcountup(w, cat)



def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)

    for cat in self.catcount.keys():
        prob = self.score(word, cat)
        if prob < max:
            max = prob
            best = cat
    return best


def classifier(self, doc):
    best = None
    max = -sys.maxint
    word = getwords(doc)
    pList = []

    for cat in self.catcount.keys():
        prob = math.exp(self.score(word, cat))
        pList.append([cat, self.score(word, cat)]) 

    return sorted( pList, key=lambda prob: prob[1], reverse=True)




Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up