全体概要
Jenkinsを使って継続的にkaggleのロジックを開発する環境を作ってみた
AWS batchの作成に関しては割愛
(jobQueueとjobDefinitionの名前を統一すれば動きます!)
ロジックお試しを何回もして気に入ったものを提出する流れになっています
ロジックお試し(repositoryにpush)
- githubのrepositoryにpush
- ECRにイメージpush & AWS batchを起動
- 結果をslackに通知したりする
提出(tagを打つ)
- githubのrepositoryにtagを打つ
- ECRにイメージpush & AWS batchを起動
- 結果を自動的にkgコマンドで提出
- 結果をslackに通知したりする
フォルダ構成
tree
├── Jenkins
│ ├── Jenkinsfile
│ └── kick_batch.py
└── Program
└── MODULE_NAME
├── Dockerfile
└── main.sh
コンテナ内で起動するメインのshellファイル
main.sh
# !/usr/bin/env bash
kaggle_user='KAGGLE_USER_NAME'
kaggle_pass='KAGGLE_USER_PASS'
kaggle_comp=${KAGGLE_COMP}
# kaggle setting
kg config -c ${kaggle_comp} -u ${kaggle_user} -p ${kaggle_pass}
# ==================================================
# DO SOMETHING & MAKE OUTPUT (ex. RESULT.csv)
# ==================================================
if [ ${KAGGLE_SUBMIT} = 'True' ]; then
echo "[INFO] submitting to kaggle"
kg submit ./RESULT.csv -m "[${kaggle_user}] ${MODULE_NAME}_${exec_time} "
echo "[INFO] end submit"
fi
AWS batchをキックするpython
kick_batch.py
from datetime import datetime
import boto3
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def kick_batch(project_name):
clientBatch = boto3.client("batch")
try:
res = clientBatch.submit_job(
jobName=project_name + "_" + timestamp(),
jobQueue=project_name,
jobDefinition=project_name,
containerOverrides={
'environment': [
{
'name': 'MODULE_NAME',
'value': os.environ['MODULE_NAME']
},
{
'name': 'BUILD_URL',
'value': os.environ['BUILD_URL']
},
{
'name': 'KAGGLE_SUBMIT',
'value': os.environ['KAGGLE_SUBMIT']
},
{
'name': 'KAGGLE_COMP',
'value': os.environ['KAGGLE_COMP']
}
]
}
)
return True, res
except:
return False, 'batch kick ERROR'
def timestamp():
return datetime.now().strftime("%Y%m%d_%H%M%S")
if __name__ == '__main__':
_check, res = kick_batch("PROJECT_NAME")
if _check:
body_text = str(res)
logger.info(body_text)
else:
body_text = str(res)
logger.error(body_text)
ECRにpush&AWS batchをkickするJenkins
Jenkinsfile
pipeline {
agent any
triggers {
pollSCM('*/1 * * * 1-7')
}
options {
// Only keep the 10 most recent builds
buildDiscarder(logRotator(numToKeepStr:'10'))
}
environment {
AWS_DEFAULT_REGION='ap-northeast-1'
PROJECT = 'kaggle-training'
SLACK_TOKEN = 'JENKINS_SLACK_TOKEN'
SLACK_CHANNEL = 'SLACK_CHANNEL'
ECR_URL = 'http://*********.dkr.ecr.ap-northeast-1.amazonaws.com'
REGION = 'ap-northeast-1'
IMAGE_NAME = 'IMAGE_NAME'
ENV = 'ENV_NAME'
TAG = "latest"
MODULE_NAME = "MODULE_NAME(DIR_NAME)"
KAGGLE_COMP= "KAGGLE_COMPETITION_NAME"
}
stages {
// ==================================================
// change in repository
// ==================================================
stage('[dev]: push to repository') {
// トリガー設定
when {
branch 'BRANCH_PREFIX*'
}
environment {
KAGGLE_SUBMIT = 'False'
}
// コマンド設定
steps {
ansiColor('xterm') {
script {
// login to ECR
sh("echo 'test@test.test' | eval \$(aws ecr get-login --no-include-email --region ap-northeast-1 | sed 's|https://||')")
dir ("Program/${MODULE_NAME}") {
// Build the docker image using a Dockerfile
docker.build("${IMAGE_NAME}:${TAG}")
// Push the Docker image to ECR
docker.withRegistry(ECR_URL) {
docker.image("${IMAGE_NAME}").push(TAG)
}
}
sh("python ./Jenkins/kick_batch.py")
}
}
}
// Build後の設定
post {
success {
slackSend(
channel: "${SLACK_CHANNEL}",
color: '#0e8a16',
message: "`${MODULE_NAME}` ${env.BRANCH_NAME} : <${env.BUILD_URL}|BUILD_URL> :white_check_mark: ${PROJECT} :aws_cloud: (${ENV})",
token: "${SLACK_TOKEN}"
)
}
failure {
slackSend(
channel: "${SLACK_CHANNEL}",
color: '#d93f0b',
message: "`${MODULE_NAME}` ${env.BRANCH_NAME} : <${env.BUILD_URL}|BUILD_URL> :x: ${PROJECT} :aws_cloud: (${ENV})",
token: "${SLACK_TOKEN}"
)
}
}
}
// ==================================================
// tag at branch
// ==================================================
stage('[dev]: Tag') {
// トリガー設定(例:v0.0.0)
when {
branch 'v*'
}
environment {
KAGGLE_SUBMIT = 'True'
}
// コマンド設定
steps {
ansiColor('xterm') {
script {
// login to ECR
sh("echo 'test@test.test' | eval \$(aws ecr get-login --no-include-email --region ap-northeast-1 | sed 's|https://||')")
dir ("Program/${MODULE_NAME}") {
// Build the docker image using a Dockerfile
docker.build("${IMAGE_NAME}:${TAG}")
// Push the Docker image to ECR
docker.withRegistry(ECR_URL) {
docker.image("${IMAGE_NAME}").push(TAG)
}
}
sh("python ./Jenkins/kick_batch.py")
}
}
}
// Build後の設定
post {
success {
slackSend(
channel: "${SLACK_CHANNEL}",
color: '#0e8a16',
message: "`${MODULE_NAME}` ${env.BRANCH_NAME} : <${env.BUILD_URL}|BUILD_URL> :white_check_mark: ${PROJECT} :aws_cloud: (${ENV})",
token: "${SLACK_TOKEN}"
)
}
failure {
slackSend(
channel: "${SLACK_CHANNEL}",
color: '#d93f0b',
message: "`${MODULE_NAME}` ${env.BRANCH_NAME} : <${env.BUILD_URL}|BUILD_URL> :x: ${PROJECT} :aws_cloud: (${ENV})",
token: "${SLACK_TOKEN}"
)
}
}
}
}
}
感想
kaggleをやる上でAWSb batchでガリガリ回しつつ、
提出も自動でやりたくて作ってみたら意外と便利
ロジックを評価する方法が今後の課題
kaggle初心者なのでどんどんやってみて覚えたい