More than 3 years have passed since last update.

AWS EC2でRayクラスター構築

Last updated at 2022-01-21Posted at 2022-01-04

概要

RayはPython用の分散実行フレームワーク。様々なライブラリと統合していて、容易に分散コンピューティングを導入できる（らしい）。

こちらを参考にしてAWS EC2でRayクラスターを構築する。その後、Rayクラスター上でJupyter notebookを立ち上げる。これでJupyter notebookを使ってRayを試すことができる。

前提

Windows 10 Home
WSLのUbuntuディストリビューション(Ubuntu20.04)がインストール済み
AWSのIAMユーザアカウント作成済み（アクセスキーとシークレットキーが手元にある）

注意事項

本手順ではEC2を立ち上げるので料金が発生することに注意！

手順

Dockerコンテナを起動
AWS IAMユーザのクレデンシャル情報準備
Rayクラスターのコンフィグファイル作成
Rayクラスター(AWS EC2)の起動
Rayクラスター上でサンプルコード実行
AWS セキュリティグループでJupyter notebookのポートを許可
SSHでRayヘッドにアクセス
Jupyter notebookを起動
Jupyter notebookにアクセス
Rayクラスター上でデータ分析

1．Dockerコンテナを起動

WSL上にRayをインストールするとWSL環境がごちゃごちゃしてくるので、Rayコマンドを使用する環境をDockerコンテナで準備する。

適当なディレクトリを作る。

mkdir dockerfile

作成したディレクトリ配下にDockerfileを作成する。

vi Dockerfile

Dockerfileの中身は以下の通り。

# Dockerイメージを指定
FROM ubuntu:20.04

# 諸々のライブラリをインストール
RUN apt-get update
RUN apt-get install -y sudo
RUN sudo apt-get install -y curl
RUN sudo apt-get install -y unzip
RUN apt-get install -y python3 python3-pip
RUN pip3 install -U "ray[default]" boto3
WORKDIR /root
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip
RUN sudo ./aws/install
RUN sudo apt-get install -y vim
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get install -y ssh
RUN apt-get install -y rsync
ENV DEBIAN_FRONTEND=interactive

# Kubectlをインストール
ENV DEBIAN_FRONTEND=noninteractive
RUN curl -LO "https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl"
RUN chmod +x ./kubectl
RUN sudo mv ./kubectl /usr/local/bin/kubectl

DockerfileをもとにDockerイメージ(ubuntu-ray:latest)を作成する。

docker build -t ubuntu-ray:latest .

作成したDockerイメージをバックグラウンドで起動する。

docker run -itd -v /home/ceg/dockerfiles/mountdir:/root ubuntu-ray:latest

起動したDockerコンテナのコンテナIDを確認する。

docker ps

Dockerコンテナのbashに入る。

docker exec -it <コンテナID> /bin/bash

2. IAMユーザのクレデンシャル情報入力

AWS Access Key IDとAWS Secret Access Keyが必要なため、事前にAWSのIAMユーザのを作成しておく。簡単に試すには、Admin権限を持ったIAMユーザを用意すればよい。最小限の権限に絞りたい場合には別記事「EC2でRayクラスターを作成するのに必要な最小限のIAM権限」を参照ください。

まずDockerコンテナのbashで以下を実行して~/.aws配下にcredentialsファイルを作成する。

root@<コンテナID>:~# aws configure

対話式でAWSのAWS Access Key ID, AWS Secret Access Key, Default region name, Default output formatを入力する。入力例は以下の通り。

AWS Access Key ID [None]: AxxxxxxxxxxxxxxxxxxxI
AWS Secret Access Key [None]: exxxxxxxxxxxxxxxxxxxxxxxx
Default region name [None]: us-west-2
Default output format [None]: json

Default region name, Default output formatは任意で構わない。

AWS Access Key ID, AWS Secret Access Keyは管理者権限をもつIAMユーザのIDとパスワードを入力する。シークレットキーはIAMユーザー作成時にのみ確認可能なので、わからない場合は新しくアカウント作成する必要がある。

3. Rayクラスターのコンフィグファイル作成

Rayクラスターの起動設定ファイルconfig.yamlを作成する。cluster_nameがminimalだと最小構成のRayクラスターになる。

config.yaml

# An unique identifier for the head node and workers of this cluster.
cluster_name: minimal

# Cloud-provider specific configuration.
provider:
    type: aws
    region: us-west-2

ap-northeast-1だと原因不明のエラーがでる。原因がわかるかた教えてください。

4. Rayクラスター(AWS EC2)の起動

作成したconfig.yamlをもとにRayクラスターを起動する。

ray up -y config.yaml

AWSコンソールでEC2が起動していることを確認する。

5．Rayクラスター上でサンプルコード実行

Rayクラスター上で動かすPythonファイルscript.pyを作成する。

script.py

from collections import Counter
import socket
import time

import ray

ray.init()

print('''This cluster consists of
    {} nodes in total
    {} CPU resources in total
'''.format(len(ray.nodes()), ray.cluster_resources()['CPU']))

@ray.remote
def f():
    time.sleep(0.001)
    # Return IP address.
    return socket.gethostbyname(socket.gethostname())

object_ids = [f.remote() for _ in range(10000)]
ip_addresses = ray.get(object_ids)

print('Tasks executed')
for ip_address, num_tasks in Counter(ip_addresses).items():
    print('    {} tasks on {}'.format(num_tasks, ip_address))

ray submitコマンドで作成したRayクラスターconfig.yaml上でscript.pyを実行する。

ray submit config.yaml script.py

script.pyを実行するとクラスターとして稼働しているノード数とCPU数が応答される。

This cluster consists of
    3 nodes in total
    6.0 CPU resources in total

Tasks executed
    3425 tasks on xxx.xxx.xxx.xxx
    3834 tasks on xxx.xxx.xxx.xxx
    2741 tasks on xxx.xxx.xxx.xxx

socket.gaierrorエラー

ray.init()で以下のようなエラーが発生する場合がある。

以下の通りsocket.gaierrorはRayバージョンが1.5より前だと発生するらしい。このエラーがでたらRayバージョンを上げる必要がある。

6. AWS セキュリティグループでJupyter notebookのポートを許可

EC2に割り当てられているセキュリティグループで、ポート8899を外部からアクセス可能に設定する。

7. SSHでRayヘッドにアクセス

RayクラスターのヘッドノードIPを取得する。

ray get-head-ip config.yaml

ヘッドノードにSSHでアクセスする。

ssh -o IdentitiesOnly=yes -i /root/.ssh/ray-autoscaler_1_us-west-2.pem ubuntu@＜クラスターヘッドのIP＞

あるいは下記コマンドでもヘッドノードにアクセスできる。

ray attach /root/config.yaml

8. Jupyter notebookを起動

Jupyter notebookを起動する。

jupyter notebook --ip=* --no-browser

注意
以下のようにアクセスできない場合がある。

もし下記パターン1/2のコマンドでJupyter notebookを起動している場合、Jupyter notebookがLocal host内部からしかアクセス許可していないことが考えられる。jupyter notebook --ip=*のようにアクセス元IPを全許可すればよい。

パターン1

jupyter notebook

パターン2

ray exec cluster.yaml --port-forward=8899 'source ~/anaconda3/bin/activate tensorflow_p36 && jupyter notebook --port=8899'

9. Jupyter notebookにアクセス

Chromeなど適当なブラウザでクラスターヘッド上に起動したJupyter notebookにアクセスする。tokenはJupyter notebook起動時にBash上に表示されたものを入力する。

http://＜クラスターヘッドのIP＞:8889/

あとは適当にJupyter notebook上でJupyter notebookを利用する。

x. Rayクラスター(AWS EC2)の削除

最後にAWS上に作成したクラスター（EC2）を削除する。忘れると課金され続けてしまうので要注意。

ray down -y config.yaml

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up