F# Script でML.NET 感情分析チュートリアルの実行

Posted at 2019-12-02

ML.NET 1.4.0 のチュートリアル感情分析を .NET Core + F# Script で実行する手順です。

開発環境の準備

.NET Core SDK のインストール

以下のページから.Net Core SDK をダウンロードしてインストールします。(2019/12/03 現在で .NET Core 3.0)

Download .NET (Linux, macOS, and Windows)

ディレクトリの作成

F# スクリプトを置くためのディレクトリを作成します。SentimentAnalysisFs とか。

Paket の導入

Paket は Nuget パッケージをいい感じに管理できるツールです。
コマンドラインで作成したディレクトリに移動し、以下を実行します。

dotnet new tool-manifest
dotnet tool install paket
dotnet tool restore

これで、dotnet paket コマンドが使えるようになりました。

参考: Paket 公式サイトのGet started

PaketでNugetパッケージの参照追加

続いて、Nugetパッケージの参照を追加します。

# paket 初期化(peket.dependencies ファイルの作成)
dotnet paket init

# NuGet パッケージのインストール
dotnet paket add Microsoft.ML
dotnet paket install

# fsx 用 ロードスクリプトの生成
dotnet paket generate-load-scripts

データの準備

チュートリアルサイトに従い、データを準備します。
今回はPowershellでコマンドラインからダウンロードとデータの取り出しを行います。

# データセットファイルを保存するディレクトリを作成
mkdir Data

# UCI Sentiment Labeled Sentences データセットの ZIP ファイルをダウンロードし、展開します
Invoke-WebRequest https://archive.ics.uci.edu/ml/machine-learning-databases/00331/sentiment%20labelled%20sentences.zip -OutFile sentiment_labelled_sentences.zip
Expand-Archive .\sentiment_labelled_sentences.zip
Remove-Item .\sentiment_labelled_sentences.zip

# yelp_labelled.txt ファイルを、作成した Data ディレクトリにコピーします
Copy-Item '.\sentiment_labelled_sentences\sentiment labelled sentences\yelp_labelled.txt' Data\

# 不要になったデータを消します
Remove-Item .\sentiment_labelled_sentences -Recurse -Force

スクリプトを書いてML.NET を実行

以下のようにスクリプトを作成します。
主な注意点は以下の通りです。

Native dll をロードするために環境変数PATHを通す必要がある
F# の record は使用可能。ただし、予測結果を入れるレコードはCLIMutable 属性の付加が必要。

SentimentAnalysis.fsx

// nugetパッケージのロード
# load @".paket/load/netstandard2.0/main.group.fsx"

open System
open Microsoft.ML
open Microsoft.ML.Data

// Native dll をロードするために 環境変数 PATH を通す
let nativeDirectory = Environment.GetFolderPath(Environment.SpecialFolder.UserProfile) + @"/.nuget/packages/microsoft.ml.cpumath\1.4.0\runtimes\win-x64\nativeassets\netstandard2.0"
Environment.SetEnvironmentVariable("Path", Environment.GetEnvironmentVariable("Path") + ";" + nativeDirectory)

type SentimentData = {
    [<LoadColumn 0>]
    SentimentText : string
    [<LoadColumn 1; ColumnName "Label">]
    Sentiment : bool
}

[<CLIMutable>]
type SentimentPrediction = {
    [<ColumnName "PredictedLabel">]
    Prediction : bool
    Probability : single
    Score : single
}

let mlContext = MLContext()

// Create training data
let trainingData = mlContext.Data.LoadFromTextFile<SentimentData>("Data/yelp_labelled.txt")

// Split data
let splitDataView = mlContext.Data.TrainTestSplit(trainingData, testFraction = 0.2)

// Specify data preparation and model training pipeline
let estimator =
    mlContext.Transforms.Text.FeaturizeText(
        outputColumnName = "Features",
        inputColumnName = "SentimentText"
    ).Append(
        mlContext.BinaryClassification.Trainers.SdcaLogisticRegression(
            labelColumnName = "Label",
            featureColumnName = "Features"
        )
    )

// Train model
printfn "=============== Create and Train the Model ==============="
let model = estimator.Fit(splitDataView.TrainSet)
printfn "=============== End of training ==============="
printfn ""

// Evaluate
printfn "=============== Evaluating Model accuracy with Test data==============="
let predictions = model.Transform(splitDataView.TestSet)

let metrics =
    mlContext.BinaryClassification.Evaluate(predictions, "Label")

printfn ""
printfn "Model quality metrics evaluation"
printfn "--------------------------------"
printfn "Accuracy: %f" metrics.Accuracy
printfn "Auc: %f" metrics.AreaUnderRocCurve
printfn "F1Score: %f" metrics.F1Score
printfn "=============== End of model evaluation ==============="

// Predict
let predictionFunction =
    mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(model)

let sampleStatement = {
    SentimentText = "This was a very bad steak"
    Sentiment = false
}

let resultPrediction = predictionFunction.Predict(sampleStatement)

printfn ""
printfn("=============== Prediction Test of model with a single sample and test dataset ===============")

printfn ""
printfn "Sentiment: %s | Prediction: %s | Probability: %f"
   sampleStatement.SentimentText
   (if resultPrediction.Prediction then "Positive" else  "Negative")
   resultPrediction.Probability

printfn "=============== End of Predictions ==============="

スクリプトを実行すれば、以下のように結果が得られるはずです。

=============== Evaluating Model accuracy with Test data===============

Model quality metrics evaluation
--------------------------------
Accuracy: 0.834225
Auc: 0.904484
F1Score: 0.835979
=============== End of model evaluation ===============

=============== Prediction Test of model with a single sample and test dataset ===============

Sentiment: This was a very bad steak | Prediction: Negative | Probability: 0.110419
=============== End of Predictions ===============

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up