1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

JubatusをDockerで動かす

Last updated at Posted at 2020-01-13

Dockerを使うと良いこと

  • 公式イメージをpullするだけでJubatusを使うことができる。
  • 環境を汚さずバージョン違いのミドルウェアで悩まない。
  • 要らなくなった時に消すのが楽ちん。

Docker環境

・Docker for Mac

% docker version
Client: Docker Engine - Community
 Version:           19.03.4
 API version:       1.40
 Go version:        go1.12.10
 Git commit:        9013bf5
 Built:             Thu Oct 17 23:44:48 2019
 OS/Arch:           darwin/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.4
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.10
  Git commit:       9013bf5
  Built:            Thu Oct 17 23:50:38 2019
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.2.10
  GitCommit:        b34a5c8af56e510852c35414db4c1f4fa6172339
 runc:
  Version:          1.0.0-rc8+dev
  GitCommit:        3e425f80a8c931f88e6d94a8c831b9d5aa481657
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

Jubatusサーバのインストール

Jubatusの公式や、Docker Hubを参考にする。

% docker pull jubatus/jubatus
Using default tag: latest
latest: Pulling from jubatus/jubatus
8284e13a281d: Pull complete 
26e1916a9297: Pull complete 
4102fc66d4ab: Pull complete 
1cf2b01777b2: Pull complete 
7f7a2d5e04ed: Pull complete 
4cb7073bed3b: Pull complete 
Digest: sha256:df700aa354604de1ae6b06d169e4203ac883385d803ec80a6e1fc5cf3fba118a
Status: Downloaded newer image for jubatus/jubatus:latest
docker.io/jubatus/jubatus:latest 

imageサイズを確認してみる

% docker images jubatus/jubatus
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
jubatus/jubatus     latest              349bb156c215        11 months ago       415MB

ちゃんとイメージが取れているっぽいので、Jubatusサーバをコンテナ内で起動してみる。今回は分類器 (Classifier) を同梱されている設定ファイルで起動している。

$ docker run --expose 9199 jubatus/jubatus jubaclassifier -f /opt/jubatus/share/jubatus/example/config/classifier/pa.json

起動したっぽいので、Ctrl+Cで一旦終了しておく。

2020-01-13 12:29:03,261 1 INFO  [server_util.cpp:429] starting jubaclassifier 1.1.1 RPC server at 172.17.0.2:9199
    pid                  : 1
    user                 : root
    mode                 : standalone mode
    timeout              : 10
    thread               : 2
    datadir              : /tmp
    logdir               : 
    log config           : 
    zookeeper            : 
    name                 : 
    interval sec         : 16
    interval count       : 512
    zookeeper timeout    : 10
    interconnect timeout : 10

2020-01-13 12:29:03,325 1 INFO  [server_util.cpp:165] load config from local file: /opt/jubatus/share/jubatus/example/config/classifier/pa.json
2020-01-13 12:29:03,325 1 INFO  [classifier_serv.cpp:116] config loaded: {
  "converter" : {
    "string_filter_types" : {},
    "string_filter_rules" : [],
    "num_filter_types" : {},
    "num_filter_rules" : [],
    "string_types" : {},
    "string_rules" : [
      { "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
    ],
    "num_types" : {},
    "num_rules" : [
      { "key" : "*", "type" : "num" }
    ]
  },
  "method" : "PA"
}

2020-01-13 12:29:03,326 1 INFO  [server_helper.hpp:226] start listening at port 9199
2020-01-13 12:29:03,326 1 INFO  [server_helper.hpp:233] jubaclassifier RPC server startup

Jubatusクライアントのインストール

Jubatusを使ったクライアントアプリケーションは C++, Python, Ruby または Java で記述することができる。今回はPythonで試してみる。

pip3 install jubatus

今度は外れ値検知を試してみる

Jubatusの公式を参考にして学習の設定ファイル(anomaly_config.json)とJubatusを使ったクライアントアプリケーション(anomaly.py)を作成する。それぞれのファイルを適当なディレクトリに置く。

設定ファイル(anomaly_config.json)は「/Users/katuemon/Documents/jubatus/conf/」に、クライアントアプリケーション(anomaly.py)は「/Users/katuemon/Documents/jubatus/」に置いた。

anomaly_config.json
{
 "method" : "lof",
 "parameter" : {
  "nearest_neighbor_num" : 10,
  "reverse_nearest_neighbor_num" : 30,
  "method" : "euclid_lsh",
  "parameter" : {
   "hash_num" : 8,
   "table_num" : 16,
   "probe_num" : 64,
   "bin_width" : 10,
   "seed" : 1234
  }
 },

 "converter" : {
  "string_filter_types": {},
  "string_filter_rules": [],
  "num_filter_types": {},
  "num_filter_rules": [],
  "string_types": {},
  "string_rules": [{"key":"*", "type":"str", "global_weight" : "bin", "sample_weight" : "bin"}],
  "num_types": {},
  "num_rules": [{"key" : "*", "type" : "num"}]
 }
}

anomaly.py(Jubatusを使ったクライアントアプリケーション)では、csvから読み込んだデータをJubatusにサーバ与え、外れ値を検出し結果を標準出力に出すようだ。

anomaly.py
# !/usr/bin/env python
# -*- coding: utf-8 -*-

import signal
import sys, json
from jubatus.anomaly import client
from jubatus.common import Datum

NAME = "anom_kddcup";

# handle keyboard interruption"
def do_exit(sig, stack):
    print('You pressed Ctrl+C.')
    print('Stop running the job.')
    sys.exit(0)

if __name__ == '__main__':
    # 0. set KeyboardInterrupt handler
    signal.signal(signal.SIGINT, do_exit)

    # 1. set jubatus server
    anom = client.Anomaly("127.0.0.1", 9199, NAME)

    # 2. prepare training data
    with open('kddcup.data_10_percent.txt', mode='r') as file:
        for line in file:
            duration, protocol_type, service, flag, src_bytes, dst_bytes, land, wrong_fragment, urgent, hot, num_failed_logins, logged_in, num_compromised, root_shell, su_attempted, num_root, num_file_creations, num_shells, num_access_files, num_outbound_cmds, is_host_login, is_guest_login, count, srv_count, serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, same_srv_rate, diff_srv_rate, srv_diff_host_rate, dst_host_count, dst_host_srv_count, dst_host_same_srv_rate, dst_host_diff_srv_rate, dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate, label = line[:-1].split(",")

            datum = Datum()
            for (k, v) in [
                    ["protocol_type", protocol_type],
                    ["service", service],
                    ["flag", flag],
                    ["land", land],
                    ["logged_in", logged_in],
                    ["is_host_login", is_host_login],
                    ["is_guest_login", is_guest_login],
                    ]:
                datum.add_string(k, v)

            for (k, v) in [
                    ["duration",float(duration)],
                    ["src_bytes", float(src_bytes)],
                    ["dst_bytes", float(dst_bytes)],
                    ["wrong_fragment", float(wrong_fragment)],
                    ["urgent", float(urgent)],
                    ["hot", float(hot)],
                    ["num_failed_logins", float(num_failed_logins)],
                    ["num_compromised", float(num_compromised)],
                    ["root_shell", float(root_shell)],
                    ["su_attempted", float(su_attempted)],
                    ["num_root", float(num_root)],
                    ["num_file_creations", float(num_file_creations)],
                    ["num_shells", float(num_shells)],
                    ["num_access_files", float(num_access_files)],
                    ["num_outbound_cmds",float(num_outbound_cmds)],
                    ["count", float(count)],
                    ["srv_count",float(srv_count)],
                    ["serror_rate", float(serror_rate)],
                    ["srv_serror_rate", float(srv_serror_rate)],
                    ["rerror_rate", float(rerror_rate)],
                    ["srv_rerror_rate",float( srv_rerror_rate)],
                    ["same_srv_rate", float(same_srv_rate)],
                    ["diff_srv_rate", float(diff_srv_rate)],
                    ["srv_diff_host_rate", float(srv_diff_host_rate)],
                    ["dst_host_count",float( dst_host_count)],
                    ["dst_host_srv_count", float(dst_host_srv_count)],
                    ["dst_host_same_srv_rate",float( dst_host_same_srv_rate)],
                    ["dst_host_same_src_port_rate",float( dst_host_same_src_port_rate)],
                    ["dst_host_diff_srv_rate", float(dst_host_diff_srv_rate)],
                    ["dst_host_srv_diff_host_rate",float(dst_host_srv_diff_host_rate)],
                    ["dst_host_serror_rate",float(dst_host_serror_rate)],
                    ["dst_host_srv_serror_rate",float(dst_host_srv_serror_rate)],
                    ["dst_host_rerror_rate",float(dst_host_rerror_rate)],
                    ["dst_host_srv_rerror_rate",float(dst_host_srv_rerror_rate)],
                    ]:
                datum.add_number(k, v)

            # 3. train data and update jubatus model
            ret = anom.add(datum)

            # 4. output results
            if (ret.score != float('Inf')) and (ret.score!= 1.0):
                print (ret, label)

サンプルプログラムの実行

データのダウンロード

% cd /Users/katuemon/Documents/jubatus/
% curl -OL http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 2094k  100 2094k    0     0  1025k      0  0:00:02  0:00:02 --:--:-- 1025k
% gunzip kddcup.data_10_percent.gz 

# kddcup.data_10_percent.txtはanomaly.pyと同じディレクトリに置く。
% mv kddcup.data_10_percent kddcup.data_10_percent.txt

jubaanomalyを起動

% docker run -p 9199:9199 -v /Users/katuemon/Documents/jubatus/conf:/tmp/config jubatus/jubatus jubaanomaly -f /tmp/config/anomaly_config.json

起動したっぽい

2020-01-13 12:57:16,401 1 INFO  [server_util.cpp:429] starting jubaanomaly 1.1.1 RPC server at 172.17.0.2:9199
    pid                  : 1
    user                 : root
    mode                 : standalone mode
    timeout              : 10
    thread               : 2
    datadir              : /tmp
    logdir               : 
    log config           : 
    zookeeper            : 
    name                 : 
    interval sec         : 16
    interval count       : 512
    zookeeper timeout    : 10
    interconnect timeout : 10

2020-01-13 12:57:16,464 1 INFO  [server_util.cpp:165] load config from local file: /tmp/config/anomaly_config.json
2020-01-13 12:57:16,469 1 INFO  [anomaly_serv.cpp:140] config loaded: {
 "method" : "lof",
 "parameter" : {
  "nearest_neighbor_num" : 10,
  "reverse_nearest_neighbor_num" : 30,
  "method" : "euclid_lsh",
  "parameter" : {
   "hash_num" : 8,
   "table_num" : 16,
   "probe_num" : 64,
   "bin_width" : 10,
   "seed" : 1234
  }
 },

 "converter" : {
  "string_filter_types": {},
  "string_filter_rules": [],
  "num_filter_types": {},
  "num_filter_rules": [],
  "string_types": {},
  "string_rules": [{"key":"*", "type":"str", "global_weight" : "bin", "sample_weight" : "bin"}],
  "num_types": {},
  "num_rules": [{"key" : "*", "type" : "num"}]
 }
}

2020-01-13 12:57:16,470 1 INFO  [server_helper.hpp:226] start listening at port 9199
2020-01-13 12:57:16,470 1 INFO  [server_helper.hpp:233] jubaanomaly RPC server startup

Jubatus Clientを実行

% python3 anomaly.py

実行結果

ちゃんとスコアも表示されていて問題なさそうだ。

id_with_score{id: 190, score: 0.9999999999999419} normal.
id_with_score{id: 195, score: 1.0000313006855042} normal.
id_with_score{id: 308, score: 0.9999999999986386} normal.
id_with_score{id: 476, score: 0.9999999999999977} normal.
id_with_score{id: 485, score: 0.9999999999999997} normal.
id_with_score{id: 490, score: 0.9999999999999836} normal.
id_with_score{id: 495, score: 1.459560361462793} normal.
id_with_score{id: 498, score: 0.999999999999998} normal.
id_with_score{id: 642, score: 0.9999999999999577} normal.
id_with_score{id: 643, score: 0.999999999999923} normal.
id_with_score{id: 654, score: 0.9999999999999812} normal.
id_with_score{id: 657, score: 0.9999999999999506} normal.
id_with_score{id: 683, score: 0.9999999999999567} normal.
id_with_score{id: 696, score: 0.9999999999899615} normal.
id_with_score{id: 697, score: 0.999999999999993} normal.
id_with_score{id: 698, score: 0.9999999999996164} normal.
id_with_score{id: 704, score: 0.9999999999999855} normal.
id_with_score{id: 705, score: 0.9999999999999941} normal.
id_with_score{id: 711, score: 0.9999999999994069} normal.
id_with_score{id: 717, score: 0.9999999999999987} normal.
id_with_score{id: 835, score: 0.9999999999999999} normal.
id_with_score{id: 1123, score: 0.9999999999999933} normal.
id_with_score{id: 1128, score: 1.0641230203490077} normal.
id_with_score{id: 1129, score: 0.9999999999999983} normal.
id_with_score{id: 1149, score: 1.0401485737893268} normal.
id_with_score{id: 1150, score: 0.9999999999996244} normal.
id_with_score{id: 1166, score: 0.9999999999999964} normal.
id_with_score{id: 1173, score: 0.9999999999999908} normal.
id_with_score{id: 1175, score: 0.999999999999569} normal.
id_with_score{id: 1662, score: 0.9999999999999974} normal.
id_with_score{id: 1678, score: 0.9999999999999984} normal.
id_with_score{id: 1681, score: 0.9999999999999952} normal.
id_with_score{id: 1692, score: 0.9999999999992727} normal.
id_with_score{id: 1710, score: 1.2717678419359209} normal.
id_with_score{id: 1711, score: 0.9999999999999989} normal.
id_with_score{id: 1720, score: 0.9999999999999992} normal.
id_with_score{id: 1732, score: 0.9999999999999959} normal.
id_with_score{id: 1733, score: 0.9999999999999953} normal.
id_with_score{id: 1745, score: 0.9999999999999792} normal.
id_with_score{id: 1746, score: 0.9999999999999954} normal.
id_with_score{id: 1881, score: 0.9999999999999983} normal.
id_with_score{id: 2212, score: 0.9999999999999997} normal.
id_with_score{id: 2285, score: 0.9999999999999966} normal.
id_with_score{id: 2287, score: 0.9999999999999962} normal.
id_with_score{id: 2288, score: 0.9999999999999981} normal.
id_with_score{id: 2292, score: 1.388673102986125} normal.
id_with_score{id: 2337, score: 0.9999999999999991} normal.
id_with_score{id: 2347, score: 0.999999999999961} normal.
id_with_score{id: 2355, score: 0.9999999999999994} normal.
id_with_score{id: 2358, score: 1.0560593847508284} normal.
id_with_score{id: 2383, score: 0.9994229663750211} normal.
id_with_score{id: 2394, score: 0.9999999999999631} normal.
id_with_score{id: 2463, score: 0.9999999999999991} normal.
id_with_score{id: 2500, score: 0.7581637350497832} normal.
id_with_score{id: 2501, score: 0.0} normal.
id_with_score{id: 2509, score: 0.9999999999999997} normal.
id_with_score{id: 2529, score: 0.999999999999994} normal.
id_with_score{id: 2548, score: 0.9999999936138952} normal.

以降省略
1
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?