More than 1 year has passed since last update.

【Java初心者】Wekaによる機械学習 04-JavaAPIを使ってロジスティック回帰

Posted at 2023-09-15

ロジスティック回帰とは

「ある事象が起きる確率を分析し結果を予測できるようにするもの」という説明がいくつものサイトで確認できます。

例えば、いろいろな条件(数値やカテゴリー値など)と、特定の結果(事象)があるとします。それらの関係を機械学習したあと、新たに発生した条件をもとに、結果(事象)が起きる可能性を算出するものです。結果(事象)はYesとNoなどの2値です。(違っていたらご指摘ください)

今回、JavaのデータマイニングツールであるWekaのAPIを使ってプログラミングをします。これまで通りmini-wekaを使用します。

参考にしたサイトとデータ

こちらです。 https://corvus-window.com/python_logistic-regression-analysis/

年収、性別、年齢、結婚しているかなどの条件と、その人が副業しているかが並んでいます。
そのデータをもとに、新たに現れた人が副業している可能性を推測します。

上記サイトではPythonのコードが書かれていますが、Javaで同じ結果が出るか試します。

教師データ

fukugyou.arff として保存します

@relation fukugyou

@attribute 年収 numeric
@attribute 性別 {M,F}
@attribute 年齢 numeric
@attribute 結婚 {n,y}
@attribute 副業 {y,n}

@data

580,M,32,n,y
430,F,28,n,n
800,M,45,y,y
780,F,36,n,y
690,F,42,y,y
510,M,30,y,n
740,F,49,y,n
350,F,23,n,n
620,M,29,y,y
500,F,25,n,n
430,M,33,n,n
590,F,36,y,y
1200,M,51,y,y
810,M,53,n,n
620,F,46,y,n
430,F,31,y,n
570,F,39,n,y
460,M,28,y,n
620,M,30,n,y
320,M,25,n,n
430,F,41,n,n

テストデータ

fukugyou-test.arff として保存します

@relation fukugyou

@attribute 年収 numeric
@attribute 性別 {M,F}
@attribute 年齢 numeric
@attribute 結婚 {n,y}
@attribute 副業 {y,n}

@data

740,M,24,n,?
490,M,31,n,?

Javaで書いてみる

package wekatest;

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.math.BigDecimal;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.List;
import weka.classifiers.functions.Logistic;
import weka.core.Instance;
import weka.core.Instances;

public class LogisticRegTest {

    public static void main(String[] args) throws IOException, Exception {
        var insFilename = "fukugyou.arff";
        var testFilename = "fukugyou-test.arff";
        List<String> yORn = List.of("Y", "N");
        byte[] readAllBytes = Files.readAllBytes(Path.of(insFilename));
        Instances datasetInstances = new Instances(
                new InputStreamReader(new ByteArrayInputStream(readAllBytes)));
        // 教師データの内容を見たい場合
        // System.out.println(datasetInstances.toString());
        datasetInstances.setClassIndex(datasetInstances.numAttributes() - 1);

        Logistic classifier = new Logistic();

        // オプション(デフォルト値のため指定しなくても構わない)
        String[] options = new String[4];
        options[0] = "-R";
        options[1] = "1.0E-8";
        options[2] = "-M";
        options[3] = "-1";
        classifier.setOptions(options);

        classifier.buildClassifier(datasetInstances);
        // 分析結果が見たい場合
        // System.out.println(classifier.toString());
        byte[] readAllBytes1 = Files.readAllBytes(Path.of(testFilename));
        Instances preds = new Instances(new InputStreamReader(new ByteArrayInputStream(readAllBytes1)));
        preds.setClassIndex(preds.numAttributes() - 1);
        for (Instance pred : preds) {
            double answer = classifier.classifyInstance(pred);
            double[] distributionForInstance = classifier.distributionForInstance(pred);
            System.out.println("YかNか:" + yORn.get((int) answer));
            BigDecimal b = new BigDecimal(distributionForInstance[0]);
            System.out.println("Yである確率:" + b.toString());
        }
    }
}

実行結果

以下のようになりました。

YかNか:Y
Yである確率:0.99971232241931284878688757089548744261264801025390625
YかNか:N
Yである確率:0.08817471710428510800472423625251394696533679962158203125

1人め(年収740万)は、副業している可能性が99.97%。2人め(年収490万)は、副業している可能性が8.82%と出ました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up