More than 5 years have passed since last update.

xgboost (python) on EC2スポットインスタンスの環境を、AWS Lambdaで用意する

Last updated at 2016-05-28Posted at 2015-10-03

xgboost触り始めたのですが、機械学習用の環境構築とか正直面倒ですよね。

手順書やらスクリプトやら書いておいても、後になってそのファイルどこやったとかね。EC2ならスポットインスタンスでお金セーブしつつパラメータを変えながら複数台パラレルで実験できるぞーと思ってみても、環境構築が手順書ベースだったりすると、何がどのパラメータの処理を走らせてるターミナルなのかわからなくなったりね。

その辺り、「AWS Lambdaでボタン一発 or コマンド一発で環境構築するモノ」をつくっておくと、比較的お気軽にアドホックに、かつ、いろいろ捗る気がしています。

以下では、EC2のスポットインスタンスを立ててxgboostが動く環境を作るところまでLambda経由で一発でやる方法について、私の方で試したところを書いてみたいと思います。

Lamda function

Code

Lambdaのhello-worldテンプレートに、以下を上書きで張り付けたうえで、諸々設定していきます。

長いのでgist にも上げておきました。
https://gist.github.com/pyr-revs/d768984ed68500bdbeb9

console.log('Loading function');

var ec2Region = 'ap-northeast-1';
var s3Region = ec2Region;
var snsRegion = ec2Region;

var s3Bucket = 'mybucket';
var shellScriptS3Key = 'sh/launch_xgboost.sh';
var shellScriptS3Path = 's3://' + s3Bucket + '/' + shellScriptS3Key;

var iamInstanceProfile = 'my_ec2_role';

var availabilityZone = ec2Region + 'a';
var spotPrice = '0.1';
var imageId = 'ami-9a2fb89a';
var instanceType = 'c3.2xlarge';
var securityGroup = 'launch-wizard-1';
var keyName = 'my_ssh_keypair';

var userData = (function () {/*#!/bin/bash
tmp=/root/sudoers_tmp
cat /etc/sudoers > $tmp
cat >> $tmp <<EOF
Defaults:ec2-user !requiretty
EOF
cat $tmp > /etc/sudoers
yum -y update
yum groupinstall -y "Development tools"
yum -y install gcc-c++ python27-devel atlas-sse3-devel lapack-devel
pip install numpy
pip install scipy
pip install pandas
aws s3 cp %s /home/ec2-user/launch_xgboost.sh
chown ec2-user /home/ec2-user/launch_xgboost.sh
chmod +x /home/ec2-user/launch_xgboost.sh
su - ec2-user /home/ec2-user/launch_xgboost.sh
*/}).toString().match(/[^]*\/\*([^]*)\*\/\}$/)[1];

var shellScriptContents = (function () {/*#!/bin/bash
git clone --recursive https://github.com/dmlc/xgboost.git
cd xgboost
./build.sh > build.log 2>&1
cd python-package
sudo -s python setup.py install > setup.log 2>&1
export AWS_DEFAULT_REGION=%s
aws sns publish --topic-arn arn:aws:sns:ap-northeast-1:xxxxxxxxxxxx:My-Sns-Topic --subject "Launch xgboost Done" --message "Launch xgboost Done!!"
*/}).toString().match(/[^]*\/\*([^]*)\*\/\}$/)[1];

exports.handler = function(event, context) {
    var util = require('util');
    var AWS = require('aws-sdk');
    
    // Write sh file for xgboost launch to S3
    AWS.config.region = s3Region;
    var shellScriptContentsFormatted = util.format(shellScriptContents, snsRegion);
    var s3 = new AWS.S3();
    var s3Params = {Bucket: s3Bucket, Key: shellScriptS3Key, Body: shellScriptContentsFormatted};
    var s3Options = {partSize: 10 * 1024 * 1024, queueSize: 1};
    //console.log(shellScriptContentsFormatted);
    
    s3.upload(s3Params, s3Options, function(err, data) {
        if (err) {
            console.log(err, err.stack);
            context.fail('[Fail]');
        }
        else {
            console.log(data);
            
            // Lauch EC2 Spot Instance with UserData
            var userDataFormatted = util.format(userData, shellScriptS3Path);
            var userDataBase64 = new Buffer(userDataFormatted).toString('base64');
    
            var ec2LaunchParams = {
                SpotPrice: spotPrice, 
                LaunchSpecification : {
                    IamInstanceProfile: {
                      Name: iamInstanceProfile
                    },
                    ImageId: imageId,
                    InstanceType: instanceType,
                    KeyName: keyName,
                    Placement: {
                      AvailabilityZone: availabilityZone
                    },
                    SecurityGroups: [
                        securityGroup
                    ],
                    UserData: userDataBase64
                }
            };
            //console.log(params);
            
            AWS.config.region = ec2Region;
            var ec2 = new AWS.EC2();
            ec2.requestSpotInstances(ec2LaunchParams, function(err, data) {
                if (err) {
                    console.log(err, err.stack);
                    context.fail('[Fail]');
                }
                else {
                    console.log(data);
                    context.succeed('[Succeed]');
                }
            });
        }
    });
};

何をやっているか

xgboostのインストール処理をするshファイルを作り、S3に保存
EC2のスポットインスタンスをリクエスト、UserDataに初期処理を書いておく
インスタンスが起動されると、UserDataでyum等々を実行。更に、Step1で作ったshファイルをs3からダウンロードしてきて、ec2-userに引き継ぐ
ec2-user権限で動くshファイルによって、xgboostがインストールされる。
全部終わったらSNSにNotificationを投げる

基本的には、以下の記事の延長線上です。

AWS Lambdaでスポットインスタンスを作成して、UserDataで環境構築や長い処理を自動で実行させる
http://qiita.com/pyr_revs/items/c7f33d1cdee3118590e3

準備/設定が必要なところ

Lamdbaの実行タイムアウト

Lambdaのタイムアウト、デフォルト3秒と短いので、60秒まで上げてください。メモリはデフォルトの128MBで良いはず。

IAM Role for Lambda

Lambdaを動かすIAM Roleとして、「EC2インスタンスの作成」「S3へのアクセス」権限を持ったロールを作っておく必要があります。管理ポリシー「AmazonEC2FullAccess」「AmazonS3FullAccess」辺りをアタッチしておけば良いということになるわけですが、これだと何故かS3への書き込みができないので、私の方では「AdministratorAccess」を振っちゃってます。。。

利用ユーザー用にIAM Roleを作成 - AWS Lambdaを始めてみる(1).ユーザーアプリケーションからのイベントを扱う
http://dev.classmethod.jp/cloud/aws/getting-started-with-aws-lambda-1st-user-application/#prepare-iam-role

IAM Role for EC2インスタンスプロファイル

EC2のインスタンスプロファイルにアタッチするロールとして、「S3へのアクセス」「SNSへのアクセス」権限を持ったロールを作っておく必要があります。管理ポリシー「AmazonEC2FullAccess」「AmazonSNSFullAccess」をアタッチすればよいでしょう。作ったら、以下の部分を書き換えてください。

var iamInstanceProfile = 'my_ec2_role';

Amazon EC2 インスタンスで実行されるアプリケーションに IAM ロールを使用してアクセス権限を付与する
http://docs.aws.amazon.com/ja_jp/IAM/latest/UserGuide/id_roles_use_switch-role-ec2.html

SNS Topic

適当なSNS Topicを作っておいてください。xgboostのインストールが終わったら通知する先になります。shell scriptのセクションの最後、以下のarnのところを書き換え。

aws sns publish --topic-arn arn:aws:sns:ap-northeast-1:xxxxxxxxxxxx:My-Sns-Topic --subject "Launch xgboost Done" --message "Launch xgboost Done!!"

まあ必須ではないんですが、SNSでメールのSubscribeしておけば、PCの前から離れてもスマホで状態のトラッキング出来たりするので、お勧めします。

Amazon Simple Notification Service の使用開始
https://docs.aws.amazon.com/ja_jp/sns/latest/dg/GettingStarted.html

その他の設定

以下のあたりは見ての通りなので詳細は省きますが、必要に応じて変更してください。ただ、s3 bucket名とkeypairは変更必須ですね。imageIdは「Amazon Linux AMI 2015.09 / HVM（SSD）EBS-Backed 64ビット / アジアパシフィック東京」のものなので、region変えたりアップデートが来たら要変更です。

var s3Bucket = 'mybucket';

var availabilityZone = ec2Region + 'a';
var spotPrice = '0.1';
var imageId = 'ami-9a2fb89a';
var instanceType = 'c3.2xlarge';
var securityGroup = 'launch-wizard-1';
var keyName = 'my_ssh_keypair';

動作確認

LambdaのTestボタンから実行するのが一番楽ですね。もちろんaws cliからでも叩けます。

スポットインスタンスのリクエストがされたのを確認したら、しばらくお待ちください。15-20分ぐらいかな？numpy, scipy, pandasのインストールに結構かかるのです。

EC2インスタンスが出来上がって、xgboostがインストールされたら、sshでログイン。動作確認にbinary_classificationのデモを動かしてみます。

cd /home/ec2-user/xgboost/demo/binary_classification
./runexp.sh

即、以下のように返ってきました。

6513x126 matrix with 143286 entries is loaded from agaricus.txt.train
6513x126 matrix with 143286 entries is saved to agaricus.txt.train.buffer
1611x126 matrix with 35442 entries is loaded from agaricus.txt.test
1611x126 matrix with 35442 entries is saved to agaricus.txt.test.buffer
boosting round 0, 0 sec elapsed
tree prunning end, 1 roots, 12 extra nodes, 0 pruned nodes ,max_depth=3
[0]     test-error:0.016139     train-error:0.014433
boosting round 1, 0 sec elapsed
tree prunning end, 1 roots, 10 extra nodes, 0 pruned nodes ,max_depth=3
[1]     test-error:0.000000     train-error:0.001228

updating end, 0 sec in all
1611x126 matrix with 35442 entries is loaded from agaricus.txt.test.buffer
start prediction...
writing prediction to pred.txt
...[略]...
booster[1]:
0:[odor=none] yes=2,no=1
        1:[bruises?=bruises] yes=4,no=3
                3:leaf=1.1457
                4:[gill-spacing=close] yes=8,no=7
                        7:leaf=-6.87558
                        8:leaf=-0.127376
        2:[spore-print-color=green] yes=6,no=5
                5:[gill-size=broad] yes=10,no=9
                        9:leaf=-0.0386054
                        10:leaf=-1.15275
                6:leaf=0.994744

もうちょい長いのを見てみたいということで、kaggle-higgsのデモを試してみます。データは別途Kaggleからダウンロードして、EC2に送りこむ必要があります。

cd /home/ec2-user/xgboost/demo/kaggle-higgs
mkdir data
# Copy training.csv and test.csv
./run.sh

結果。これも1分もかからずに終わりますが。

[0]     train-auc:0.910911      train-ams@0.15:3.699574
[1]     train-auc:0.915308      train-ams@0.15:3.971228
[2]     train-auc:0.917743      train-ams@0.15:4.067463
...[略]...
[118]   train-auc:0.945648      train-ams@0.15:5.937291
[119]   train-auc:0.945800      train-ams@0.15:5.935622
finish training
finish loading from csv

まあこんな感じで、問題なく動いているという認識です。

ハマったところのメモ

UserData Section

yumで入れる依存関係

yum groupinstall -y "Development tools"
yum -y install gcc-c++ python27-devel atlas-sse3-devel lapack-devel

gcc等の開発ツールと、Blas/Lapack等々が必要と出たので入れてます。以下を参考にしました。

Installing scikit-learn on Amazon EC2
http://dacamo76.com/blog/2012/12/07/installing-scikit-learn-on-amazon-ec2/

pipで入れる依存関係

pip install numpy
pip install scipy
pip install pandas

numpy, scipyは必須。pandasはインストール時間長くなるのでお好みでですが、いろいろ改善されているようなので入れておいていいんじゃないでしょうか。

Python XGBoost + pandas 連携の改善
http://sinhrks.hatenablog.com/entry/2015/10/03/080719

/etc/sudoers

tmp=/root/sudoers_tmp
cat /etc/sudoers > $tmp
cat >> $tmp <<EOF
Defaults:ec2-user !requiretty
EOF
cat $tmp > /etc/sudoers

EC2(というかAMI)では、デフォルトではroot以外のsudoはtty必須(シェルスクリプト内ではsudo出来ない)です。それだと後で困るので、ファイル末尾にDefaults:ec2-user !requirettyを足して、ec2-userでもシェルスクリプト内sudo出来るようにします。

以下を参考に、どうせ終わったらさくっとインスタンス消すという前提で、雑に設定してます。

何度も要求されるパスワード入力を省略する
http://qiita.com/yuku_t/items/5f995bb0dfc894c9f2df

ShellScript Section

xgboostのインストール

git clone https://github.com/dmlc/xgboost.git
cd xgboost
./build.sh > build.log 2>&1
cd python-package
sudo -s python setup.py install > setup.log 2>&1

インストールはソースから。最初にxgboost/build.shを実行してバイナリを作ったうえで、それをxgboost/python-package/setup.pyでPythonにインストールという流れですね。

python setup.py installはsudoが必要。/etc/sudoersの設定をしておかないと、ここでsudo: sorry, you must have a tty to run sudoが出てハマります。

おわりに

終わったら自動終了とか、夜間バッチ化とかしたい場合、以下が参考になるかもです。

AWS Data Pipelineから (EC2を起動させずに) LambdaをCron実行
http://qiita.com/pyr_revs/items/d2ec88a8fafeace7da4a

AWS Lambdaでスポットインスタンスを作成して、UserDataで環境構築や長い処理を自動で実行させる
http://qiita.com/pyr_revs/items/c7f33d1cdee3118590e3

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up