More than 5 years have passed since last update.

JubatusをDockerで動かす

Last updated at 2020-01-13Posted at 2020-01-13

Dockerを使うと良いこと

公式イメージをpullするだけでJubatusを使うことができる。
環境を汚さずバージョン違いのミドルウェアで悩まない。
要らなくなった時に消すのが楽ちん。

Docker環境

・Docker for Mac

% docker version
Client: Docker Engine - Community
 Version:           19.03.4
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        9013bf5
 Built:             Thu Oct 17 23:44:48 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.4
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       9013bf5
  Built:            Thu Oct 17 23:50:38 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Jubatusサーバのインストール

Jubatusの公式や、Docker Hubを参考にする。

% docker pull jubatus/jubatus
Using default tag: latest
latest: Pulling from jubatus/jubatus
8284e13a281d: Pull complete 
26e1916a9297: Pull complete 
4102fc66d4ab: Pull complete 
1cf2b01777b2: Pull complete 
7f7a2d5e04ed: Pull complete 
4cb7073bed3b: Pull complete 
Digest: sha256:df700aa354604de1ae6b06d169e4203ac883385d803ec80a6e1fc5cf3fba118a
Status: Downloaded newer image for jubatus/jubatus:latest
docker.io/jubatus/jubatus:latest

imageサイズを確認してみる

% docker images jubatus/jubatus
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
jubatus/jubatus     latest              349bb156c215        11 months ago       415MB

ちゃんとイメージが取れているっぽいので、Jubatusサーバをコンテナ内で起動してみる。今回は分類器 (Classifier) を同梱されている設定ファイルで起動している。

$ docker run --expose 9199 jubatus/jubatus jubaclassifier -f /opt/jubatus/share/jubatus/example/config/classifier/pa.json

起動したっぽいので、Ctrl+Cで一旦終了しておく。

2020-01-13 12:29:03,261 1 INFO  [server_util.cpp:429] starting jubaclassifier 1.1.1 RPC server at 172.17.0.2:9199
    pid                  : 1
    user                 : root
    mode                 : standalone mode
    timeout              : 10
    thread               : 2
    datadir              : /tmp
    logdir               : 
    log config           : 
    zookeeper            : 
    name                 : 
    interval sec         : 16
    interval count       : 512
    zookeeper timeout    : 10
    interconnect timeout : 10

2020-01-13 12:29:03,325 1 INFO  [server_util.cpp:165] load config from local file: /opt/jubatus/share/jubatus/example/config/classifier/pa.json
2020-01-13 12:29:03,325 1 INFO  [classifier_serv.cpp:116] config loaded: {
  "converter" : {
    "string_filter_types" : {},
    "string_filter_rules" : [],
    "num_filter_types" : {},
    "num_filter_rules" : [],
    "string_types" : {},
    "string_rules" : [
      { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
    ],
    "num_types" : {},
    "num_rules" : [
      { "key" : "*", "type" : "num" }
    ]
  },
  "method" : "PA"
}

2020-01-13 12:29:03,326 1 INFO  [server_helper.hpp:226] start listening at port 9199
2020-01-13 12:29:03,326 1 INFO  [server_helper.hpp:233] jubaclassifier RPC server startup

Jubatusクライアントのインストール

Jubatusを使ったクライアントアプリケーションは C++, Python, Ruby または Java で記述することができる。今回はPythonで試してみる。

pip3 install jubatus

今度は外れ値検知を試してみる

Jubatusの公式を参考にして学習の設定ファイル（anomaly_config.json）とJubatusを使ったクライアントアプリケーション（anomaly.py）を作成する。それぞれのファイルを適当なディレクトリに置く。

設定ファイル（anomaly_config.json）は「/Users/katuemon/Documents/jubatus/conf/」に、クライアントアプリケーション（anomaly.py）は「/Users/katuemon/Documents/jubatus/」に置いた。

anomaly_config.json

{
 "method" : "lof",
 "parameter" : {
  "nearest_neighbor_num" : 10,
  "reverse_nearest_neighbor_num" : 30,
  "method" : "euclid_lsh",
  "parameter" : {
   "hash_num" : 8,
   "table_num" : 16,
   "probe_num" : 64,
   "bin_width" : 10,
   "seed" : 1234
  }
 },

 "converter" : {
  "string_filter_types": {},
  "string_filter_rules": [],
  "num_filter_types": {},
  "num_filter_rules": [],
  "string_types": {},
  "string_rules": [{"key":"*", "type":"str", "global_weight" : "bin", "sample_weight" : "bin"}],
  "num_types": {},
  "num_rules": [{"key" : "*", "type" : "num"}]
 }
}

anomaly.py（Jubatusを使ったクライアントアプリケーション）では、csvから読み込んだデータをJubatusにサーバ与え、外れ値を検出し結果を標準出力に出すようだ。

anomaly.py

# !/usr/bin/env python
# -*- coding: utf-8 -*-

import signal
import sys, json
from jubatus.anomaly import client
from jubatus.common import Datum

NAME = "anom_kddcup";

# handle keyboard interruption"
def do_exit(sig, stack):
    print('You pressed Ctrl+C.')
    print('Stop running the job.')
    sys.exit(0)

if __name__ == '__main__':
    # 0. set KeyboardInterrupt handler
    signal.signal(signal.SIGINT, do_exit)

    # 1. set jubatus server
    anom = client.Anomaly("127.0.0.1", 9199, NAME)

    # 2. prepare training data
    with open('kddcup.data_10_percent.txt', mode='r') as file:
        for line in file:
            duration, protocol_type, service, flag, src_bytes, dst_bytes, land, wrong_fragment, urgent, hot, num_failed_logins, logged_in, num_compromised, root_shell, su_attempted, num_root, num_file_creations, num_shells, num_access_files, num_outbound_cmds, is_host_login, is_guest_login, count, srv_count, serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, same_srv_rate, diff_srv_rate, srv_diff_host_rate, dst_host_count, dst_host_srv_count, dst_host_same_srv_rate, dst_host_diff_srv_rate, dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate, label = line[:-1].split(",")

            datum = Datum()
            for (k, v) in [
                    ["protocol_type", protocol_type],
                    ["service", service],
                    ["flag", flag],
                    ["land", land],
                    ["logged_in", logged_in],
                    ["is_host_login", is_host_login],
                    ["is_guest_login", is_guest_login],
                    ]:
                datum.add_string(k, v)

            for (k, v) in [
                    ["duration",float(duration)],
                    ["src_bytes", float(src_bytes)],
                    ["dst_bytes", float(dst_bytes)],
                    ["wrong_fragment", float(wrong_fragment)],
                    ["urgent", float(urgent)],
                    ["hot", float(hot)],
                    ["num_failed_logins", float(num_failed_logins)],
                    ["num_compromised", float(num_compromised)],
                    ["root_shell", float(root_shell)],
                    ["su_attempted", float(su_attempted)],
                    ["num_root", float(num_root)],
                    ["num_file_creations", float(num_file_creations)],
                    ["num_shells", float(num_shells)],
                    ["num_access_files", float(num_access_files)],
                    ["num_outbound_cmds",float(num_outbound_cmds)],
                    ["count", float(count)],
                    ["srv_count",float(srv_count)],
                    ["serror_rate", float(serror_rate)],
                    ["srv_serror_rate", float(srv_serror_rate)],
                    ["rerror_rate", float(rerror_rate)],
                    ["srv_rerror_rate",float( srv_rerror_rate)],
                    ["same_srv_rate", float(same_srv_rate)],
                    ["diff_srv_rate", float(diff_srv_rate)],
                    ["srv_diff_host_rate", float(srv_diff_host_rate)],
                    ["dst_host_count",float( dst_host_count)],
                    ["dst_host_srv_count", float(dst_host_srv_count)],
                    ["dst_host_same_srv_rate",float( dst_host_same_srv_rate)],
                    ["dst_host_same_src_port_rate",float( dst_host_same_src_port_rate)],
                    ["dst_host_diff_srv_rate", float(dst_host_diff_srv_rate)],
                    ["dst_host_srv_diff_host_rate",float(dst_host_srv_diff_host_rate)],
                    ["dst_host_serror_rate",float(dst_host_serror_rate)],
                    ["dst_host_srv_serror_rate",float(dst_host_srv_serror_rate)],
                    ["dst_host_rerror_rate",float(dst_host_rerror_rate)],
                    ["dst_host_srv_rerror_rate",float(dst_host_srv_rerror_rate)],
                    ]:
                datum.add_number(k, v)

            # 3. train data and update jubatus model
            ret = anom.add(datum)

            # 4. output results
            if (ret.score != float('Inf')) and (ret.score!= 1.0):
                print (ret, label)

サンプルプログラムの実行

データのダウンロード

% cd /Users/katuemon/Documents/jubatus/
% curl -OL http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2094k  100 2094k    0     0  1025k      0  0:00:02  0:00:02 --:--:-- 1025k
% gunzip kddcup.data_10_percent.gz 

# kddcup.data_10_percent.txtはanomaly.pyと同じディレクトリに置く。
% mv kddcup.data_10_percent kddcup.data_10_percent.txt

jubaanomalyを起動

% docker run -p 9199:9199 -v /Users/katuemon/Documents/jubatus/conf:/tmp/config jubatus/jubatus jubaanomaly -f /tmp/config/anomaly_config.json

起動したっぽい

2020-01-13 12:57:16,401 1 INFO  [server_util.cpp:429] starting jubaanomaly 1.1.1 RPC server at 172.17.0.2:9199
    pid                  : 1
    user                 : root
    mode                 : standalone mode
    timeout              : 10
    thread               : 2
    datadir              : /tmp
    logdir               : 
    log config           : 
    zookeeper            : 
    name                 : 
    interval sec         : 16
    interval count       : 512
    zookeeper timeout    : 10
    interconnect timeout : 10

2020-01-13 12:57:16,464 1 INFO  [server_util.cpp:165] load config from local file: /tmp/config/anomaly_config.json
2020-01-13 12:57:16,469 1 INFO  [anomaly_serv.cpp:140] config loaded: {
 "method" : "lof",
 "parameter" : {
  "nearest_neighbor_num" : 10,
  "reverse_nearest_neighbor_num" : 30,
  "method" : "euclid_lsh",
  "parameter" : {
   "hash_num" : 8,
   "table_num" : 16,
   "probe_num" : 64,
   "bin_width" : 10,
   "seed" : 1234
  }
 },

 "converter" : {
  "string_filter_types": {},
  "string_filter_rules": [],
  "num_filter_types": {},
  "num_filter_rules": [],
  "string_types": {},
  "string_rules": [{"key":"*", "type":"str", "global_weight" : "bin", "sample_weight" : "bin"}],
  "num_types": {},
  "num_rules": [{"key" : "*", "type" : "num"}]
 }
}

2020-01-13 12:57:16,470 1 INFO  [server_helper.hpp:226] start listening at port 9199
2020-01-13 12:57:16,470 1 INFO  [server_helper.hpp:233] jubaanomaly RPC server startup

Jubatus Clientを実行

% python3 anomaly.py

実行結果

ちゃんとスコアも表示されていて問題なさそうだ。

id_with_score{id: 190, score: 0.9999999999999419} normal.
id_with_score{id: 195, score: 1.0000313006855042} normal.
id_with_score{id: 308, score: 0.9999999999986386} normal.
id_with_score{id: 476, score: 0.9999999999999977} normal.
id_with_score{id: 485, score: 0.9999999999999997} normal.
id_with_score{id: 490, score: 0.9999999999999836} normal.
id_with_score{id: 495, score: 1.459560361462793} normal.
id_with_score{id: 498, score: 0.999999999999998} normal.
id_with_score{id: 642, score: 0.9999999999999577} normal.
id_with_score{id: 643, score: 0.999999999999923} normal.
id_with_score{id: 654, score: 0.9999999999999812} normal.
id_with_score{id: 657, score: 0.9999999999999506} normal.
id_with_score{id: 683, score: 0.9999999999999567} normal.
id_with_score{id: 696, score: 0.9999999999899615} normal.
id_with_score{id: 697, score: 0.999999999999993} normal.
id_with_score{id: 698, score: 0.9999999999996164} normal.
id_with_score{id: 704, score: 0.9999999999999855} normal.
id_with_score{id: 705, score: 0.9999999999999941} normal.
id_with_score{id: 711, score: 0.9999999999994069} normal.
id_with_score{id: 717, score: 0.9999999999999987} normal.
id_with_score{id: 835, score: 0.9999999999999999} normal.
id_with_score{id: 1123, score: 0.9999999999999933} normal.
id_with_score{id: 1128, score: 1.0641230203490077} normal.
id_with_score{id: 1129, score: 0.9999999999999983} normal.
id_with_score{id: 1149, score: 1.0401485737893268} normal.
id_with_score{id: 1150, score: 0.9999999999996244} normal.
id_with_score{id: 1166, score: 0.9999999999999964} normal.
id_with_score{id: 1173, score: 0.9999999999999908} normal.
id_with_score{id: 1175, score: 0.999999999999569} normal.
id_with_score{id: 1662, score: 0.9999999999999974} normal.
id_with_score{id: 1678, score: 0.9999999999999984} normal.
id_with_score{id: 1681, score: 0.9999999999999952} normal.
id_with_score{id: 1692, score: 0.9999999999992727} normal.
id_with_score{id: 1710, score: 1.2717678419359209} normal.
id_with_score{id: 1711, score: 0.9999999999999989} normal.
id_with_score{id: 1720, score: 0.9999999999999992} normal.
id_with_score{id: 1732, score: 0.9999999999999959} normal.
id_with_score{id: 1733, score: 0.9999999999999953} normal.
id_with_score{id: 1745, score: 0.9999999999999792} normal.
id_with_score{id: 1746, score: 0.9999999999999954} normal.
id_with_score{id: 1881, score: 0.9999999999999983} normal.
id_with_score{id: 2212, score: 0.9999999999999997} normal.
id_with_score{id: 2285, score: 0.9999999999999966} normal.
id_with_score{id: 2287, score: 0.9999999999999962} normal.
id_with_score{id: 2288, score: 0.9999999999999981} normal.
id_with_score{id: 2292, score: 1.388673102986125} normal.
id_with_score{id: 2337, score: 0.9999999999999991} normal.
id_with_score{id: 2347, score: 0.999999999999961} normal.
id_with_score{id: 2355, score: 0.9999999999999994} normal.
id_with_score{id: 2358, score: 1.0560593847508284} normal.
id_with_score{id: 2383, score: 0.9994229663750211} normal.
id_with_score{id: 2394, score: 0.9999999999999631} normal.
id_with_score{id: 2463, score: 0.9999999999999991} normal.
id_with_score{id: 2500, score: 0.7581637350497832} normal.
id_with_score{id: 2501, score: 0.0} normal.
id_with_score{id: 2509, score: 0.9999999999999997} normal.
id_with_score{id: 2529, score: 0.999999999999994} normal.
id_with_score{id: 2548, score: 0.9999999936138952} normal.

以降省略

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up