Dockerを使うと良いこと
- 公式イメージをpullするだけでJubatusを使うことができる。
- 環境を汚さずバージョン違いのミドルウェアで悩まない。
- 要らなくなった時に消すのが楽ちん。
Docker環境
・Docker for Mac
% docker version
Client: Docker Engine - Community
Version: 19.03.4
API version: 1.40
Go version: go1.12.10
Git commit: 9013bf5
Built: Thu Oct 17 23:44:48 2019
OS/Arch: darwin/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.4
API version: 1.40 (minimum version 1.12)
Go version: go1.12.10
Git commit: 9013bf5
Built: Thu Oct 17 23:50:38 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.2.10
GitCommit: b34a5c8af56e510852c35414db4c1f4fa6172339
runc:
Version: 1.0.0-rc8+dev
GitCommit: 3e425f80a8c931f88e6d94a8c831b9d5aa481657
docker-init:
Version: 0.18.0
GitCommit: fec3683
Jubatusサーバのインストール
Jubatusの公式や、Docker Hubを参考にする。
% docker pull jubatus/jubatus
Using default tag: latest
latest: Pulling from jubatus/jubatus
8284e13a281d: Pull complete
26e1916a9297: Pull complete
4102fc66d4ab: Pull complete
1cf2b01777b2: Pull complete
7f7a2d5e04ed: Pull complete
4cb7073bed3b: Pull complete
Digest: sha256:df700aa354604de1ae6b06d169e4203ac883385d803ec80a6e1fc5cf3fba118a
Status: Downloaded newer image for jubatus/jubatus:latest
docker.io/jubatus/jubatus:latest
imageサイズを確認してみる
% docker images jubatus/jubatus
REPOSITORY TAG IMAGE ID CREATED SIZE
jubatus/jubatus latest 349bb156c215 11 months ago 415MB
ちゃんとイメージが取れているっぽいので、Jubatusサーバをコンテナ内で起動してみる。今回は分類器 (Classifier) を同梱されている設定ファイルで起動している。
$ docker run --expose 9199 jubatus/jubatus jubaclassifier -f /opt/jubatus/share/jubatus/example/config/classifier/pa.json
起動したっぽいので、Ctrl+Cで一旦終了しておく。
2020-01-13 12:29:03,261 1 INFO [server_util.cpp:429] starting jubaclassifier 1.1.1 RPC server at 172.17.0.2:9199
pid : 1
user : root
mode : standalone mode
timeout : 10
thread : 2
datadir : /tmp
logdir :
log config :
zookeeper :
name :
interval sec : 16
interval count : 512
zookeeper timeout : 10
interconnect timeout : 10
2020-01-13 12:29:03,325 1 INFO [server_util.cpp:165] load config from local file: /opt/jubatus/share/jubatus/example/config/classifier/pa.json
2020-01-13 12:29:03,325 1 INFO [classifier_serv.cpp:116] config loaded: {
"converter" : {
"string_filter_types" : {},
"string_filter_rules" : [],
"num_filter_types" : {},
"num_filter_rules" : [],
"string_types" : {},
"string_rules" : [
{ "key" : "*", "type" : "str", "sample_weight" : "bin", "global_weight" : "bin" }
],
"num_types" : {},
"num_rules" : [
{ "key" : "*", "type" : "num" }
]
},
"method" : "PA"
}
2020-01-13 12:29:03,326 1 INFO [server_helper.hpp:226] start listening at port 9199
2020-01-13 12:29:03,326 1 INFO [server_helper.hpp:233] jubaclassifier RPC server startup
Jubatusクライアントのインストール
Jubatusを使ったクライアントアプリケーションは C++, Python, Ruby または Java で記述することができる。今回はPythonで試してみる。
pip3 install jubatus
今度は外れ値検知を試してみる
Jubatusの公式を参考にして学習の設定ファイル(anomaly_config.json)とJubatusを使ったクライアントアプリケーション(anomaly.py)を作成する。それぞれのファイルを適当なディレクトリに置く。
設定ファイル(anomaly_config.json)は「/Users/katuemon/Documents/jubatus/conf/」に、クライアントアプリケーション(anomaly.py)は「/Users/katuemon/Documents/jubatus/」に置いた。
{
"method" : "lof",
"parameter" : {
"nearest_neighbor_num" : 10,
"reverse_nearest_neighbor_num" : 30,
"method" : "euclid_lsh",
"parameter" : {
"hash_num" : 8,
"table_num" : 16,
"probe_num" : 64,
"bin_width" : 10,
"seed" : 1234
}
},
"converter" : {
"string_filter_types": {},
"string_filter_rules": [],
"num_filter_types": {},
"num_filter_rules": [],
"string_types": {},
"string_rules": [{"key":"*", "type":"str", "global_weight" : "bin", "sample_weight" : "bin"}],
"num_types": {},
"num_rules": [{"key" : "*", "type" : "num"}]
}
}
anomaly.py(Jubatusを使ったクライアントアプリケーション)では、csvから読み込んだデータをJubatusにサーバ与え、外れ値を検出し結果を標準出力に出すようだ。
# !/usr/bin/env python
# -*- coding: utf-8 -*-
import signal
import sys, json
from jubatus.anomaly import client
from jubatus.common import Datum
NAME = "anom_kddcup";
# handle keyboard interruption"
def do_exit(sig, stack):
print('You pressed Ctrl+C.')
print('Stop running the job.')
sys.exit(0)
if __name__ == '__main__':
# 0. set KeyboardInterrupt handler
signal.signal(signal.SIGINT, do_exit)
# 1. set jubatus server
anom = client.Anomaly("127.0.0.1", 9199, NAME)
# 2. prepare training data
with open('kddcup.data_10_percent.txt', mode='r') as file:
for line in file:
duration, protocol_type, service, flag, src_bytes, dst_bytes, land, wrong_fragment, urgent, hot, num_failed_logins, logged_in, num_compromised, root_shell, su_attempted, num_root, num_file_creations, num_shells, num_access_files, num_outbound_cmds, is_host_login, is_guest_login, count, srv_count, serror_rate, srv_serror_rate, rerror_rate, srv_rerror_rate, same_srv_rate, diff_srv_rate, srv_diff_host_rate, dst_host_count, dst_host_srv_count, dst_host_same_srv_rate, dst_host_diff_srv_rate, dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, dst_host_serror_rate, dst_host_srv_serror_rate, dst_host_rerror_rate, dst_host_srv_rerror_rate, label = line[:-1].split(",")
datum = Datum()
for (k, v) in [
["protocol_type", protocol_type],
["service", service],
["flag", flag],
["land", land],
["logged_in", logged_in],
["is_host_login", is_host_login],
["is_guest_login", is_guest_login],
]:
datum.add_string(k, v)
for (k, v) in [
["duration",float(duration)],
["src_bytes", float(src_bytes)],
["dst_bytes", float(dst_bytes)],
["wrong_fragment", float(wrong_fragment)],
["urgent", float(urgent)],
["hot", float(hot)],
["num_failed_logins", float(num_failed_logins)],
["num_compromised", float(num_compromised)],
["root_shell", float(root_shell)],
["su_attempted", float(su_attempted)],
["num_root", float(num_root)],
["num_file_creations", float(num_file_creations)],
["num_shells", float(num_shells)],
["num_access_files", float(num_access_files)],
["num_outbound_cmds",float(num_outbound_cmds)],
["count", float(count)],
["srv_count",float(srv_count)],
["serror_rate", float(serror_rate)],
["srv_serror_rate", float(srv_serror_rate)],
["rerror_rate", float(rerror_rate)],
["srv_rerror_rate",float( srv_rerror_rate)],
["same_srv_rate", float(same_srv_rate)],
["diff_srv_rate", float(diff_srv_rate)],
["srv_diff_host_rate", float(srv_diff_host_rate)],
["dst_host_count",float( dst_host_count)],
["dst_host_srv_count", float(dst_host_srv_count)],
["dst_host_same_srv_rate",float( dst_host_same_srv_rate)],
["dst_host_same_src_port_rate",float( dst_host_same_src_port_rate)],
["dst_host_diff_srv_rate", float(dst_host_diff_srv_rate)],
["dst_host_srv_diff_host_rate",float(dst_host_srv_diff_host_rate)],
["dst_host_serror_rate",float(dst_host_serror_rate)],
["dst_host_srv_serror_rate",float(dst_host_srv_serror_rate)],
["dst_host_rerror_rate",float(dst_host_rerror_rate)],
["dst_host_srv_rerror_rate",float(dst_host_srv_rerror_rate)],
]:
datum.add_number(k, v)
# 3. train data and update jubatus model
ret = anom.add(datum)
# 4. output results
if (ret.score != float('Inf')) and (ret.score!= 1.0):
print (ret, label)
サンプルプログラムの実行
データのダウンロード
% cd /Users/katuemon/Documents/jubatus/
% curl -OL http://kdd.ics.uci.edu/databases/kddcup99/kddcup.data_10_percent.gz
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 2094k 100 2094k 0 0 1025k 0 0:00:02 0:00:02 --:--:-- 1025k
% gunzip kddcup.data_10_percent.gz
# kddcup.data_10_percent.txtはanomaly.pyと同じディレクトリに置く。
% mv kddcup.data_10_percent kddcup.data_10_percent.txt
jubaanomalyを起動
% docker run -p 9199:9199 -v /Users/katuemon/Documents/jubatus/conf:/tmp/config jubatus/jubatus jubaanomaly -f /tmp/config/anomaly_config.json
起動したっぽい
2020-01-13 12:57:16,401 1 INFO [server_util.cpp:429] starting jubaanomaly 1.1.1 RPC server at 172.17.0.2:9199
pid : 1
user : root
mode : standalone mode
timeout : 10
thread : 2
datadir : /tmp
logdir :
log config :
zookeeper :
name :
interval sec : 16
interval count : 512
zookeeper timeout : 10
interconnect timeout : 10
2020-01-13 12:57:16,464 1 INFO [server_util.cpp:165] load config from local file: /tmp/config/anomaly_config.json
2020-01-13 12:57:16,469 1 INFO [anomaly_serv.cpp:140] config loaded: {
"method" : "lof",
"parameter" : {
"nearest_neighbor_num" : 10,
"reverse_nearest_neighbor_num" : 30,
"method" : "euclid_lsh",
"parameter" : {
"hash_num" : 8,
"table_num" : 16,
"probe_num" : 64,
"bin_width" : 10,
"seed" : 1234
}
},
"converter" : {
"string_filter_types": {},
"string_filter_rules": [],
"num_filter_types": {},
"num_filter_rules": [],
"string_types": {},
"string_rules": [{"key":"*", "type":"str", "global_weight" : "bin", "sample_weight" : "bin"}],
"num_types": {},
"num_rules": [{"key" : "*", "type" : "num"}]
}
}
2020-01-13 12:57:16,470 1 INFO [server_helper.hpp:226] start listening at port 9199
2020-01-13 12:57:16,470 1 INFO [server_helper.hpp:233] jubaanomaly RPC server startup
Jubatus Clientを実行
% python3 anomaly.py
実行結果
ちゃんとスコアも表示されていて問題なさそうだ。
id_with_score{id: 190, score: 0.9999999999999419} normal.
id_with_score{id: 195, score: 1.0000313006855042} normal.
id_with_score{id: 308, score: 0.9999999999986386} normal.
id_with_score{id: 476, score: 0.9999999999999977} normal.
id_with_score{id: 485, score: 0.9999999999999997} normal.
id_with_score{id: 490, score: 0.9999999999999836} normal.
id_with_score{id: 495, score: 1.459560361462793} normal.
id_with_score{id: 498, score: 0.999999999999998} normal.
id_with_score{id: 642, score: 0.9999999999999577} normal.
id_with_score{id: 643, score: 0.999999999999923} normal.
id_with_score{id: 654, score: 0.9999999999999812} normal.
id_with_score{id: 657, score: 0.9999999999999506} normal.
id_with_score{id: 683, score: 0.9999999999999567} normal.
id_with_score{id: 696, score: 0.9999999999899615} normal.
id_with_score{id: 697, score: 0.999999999999993} normal.
id_with_score{id: 698, score: 0.9999999999996164} normal.
id_with_score{id: 704, score: 0.9999999999999855} normal.
id_with_score{id: 705, score: 0.9999999999999941} normal.
id_with_score{id: 711, score: 0.9999999999994069} normal.
id_with_score{id: 717, score: 0.9999999999999987} normal.
id_with_score{id: 835, score: 0.9999999999999999} normal.
id_with_score{id: 1123, score: 0.9999999999999933} normal.
id_with_score{id: 1128, score: 1.0641230203490077} normal.
id_with_score{id: 1129, score: 0.9999999999999983} normal.
id_with_score{id: 1149, score: 1.0401485737893268} normal.
id_with_score{id: 1150, score: 0.9999999999996244} normal.
id_with_score{id: 1166, score: 0.9999999999999964} normal.
id_with_score{id: 1173, score: 0.9999999999999908} normal.
id_with_score{id: 1175, score: 0.999999999999569} normal.
id_with_score{id: 1662, score: 0.9999999999999974} normal.
id_with_score{id: 1678, score: 0.9999999999999984} normal.
id_with_score{id: 1681, score: 0.9999999999999952} normal.
id_with_score{id: 1692, score: 0.9999999999992727} normal.
id_with_score{id: 1710, score: 1.2717678419359209} normal.
id_with_score{id: 1711, score: 0.9999999999999989} normal.
id_with_score{id: 1720, score: 0.9999999999999992} normal.
id_with_score{id: 1732, score: 0.9999999999999959} normal.
id_with_score{id: 1733, score: 0.9999999999999953} normal.
id_with_score{id: 1745, score: 0.9999999999999792} normal.
id_with_score{id: 1746, score: 0.9999999999999954} normal.
id_with_score{id: 1881, score: 0.9999999999999983} normal.
id_with_score{id: 2212, score: 0.9999999999999997} normal.
id_with_score{id: 2285, score: 0.9999999999999966} normal.
id_with_score{id: 2287, score: 0.9999999999999962} normal.
id_with_score{id: 2288, score: 0.9999999999999981} normal.
id_with_score{id: 2292, score: 1.388673102986125} normal.
id_with_score{id: 2337, score: 0.9999999999999991} normal.
id_with_score{id: 2347, score: 0.999999999999961} normal.
id_with_score{id: 2355, score: 0.9999999999999994} normal.
id_with_score{id: 2358, score: 1.0560593847508284} normal.
id_with_score{id: 2383, score: 0.9994229663750211} normal.
id_with_score{id: 2394, score: 0.9999999999999631} normal.
id_with_score{id: 2463, score: 0.9999999999999991} normal.
id_with_score{id: 2500, score: 0.7581637350497832} normal.
id_with_score{id: 2501, score: 0.0} normal.
id_with_score{id: 2509, score: 0.9999999999999997} normal.
id_with_score{id: 2529, score: 0.999999999999994} normal.
id_with_score{id: 2548, score: 0.9999999936138952} normal.
以降省略