More than 5 years have passed since last update.

(株)日立製作所サービスコンピューティング研究部Advent Calendar 2019

AWS EKSからFargateと別のGPUノードグループを両方使う

Last updated at 2019-12-09Posted at 2019-12-09

こんにちは、(株)日立製作所研究開発グループサービスコンピューティング研究部の大崎です。現在北米の拠点で活動しております。

12/3にAWS re:Invent 2019でEKS関連の新機能が発表されました。サーバレスなマネージドコンテナサービスFargateがワーカーとして利用できるという機能です。
https://aws.amazon.com/blogs/aws/amazon-eks-on-aws-fargate-now-generally-available/

Fargateを使えば、ワーカーノード管理の作業の軽減が期待できますね。早速使用したいと思います。

FargateはGPUは以下の通りまだサポートされていないようです。

その代わりEKSには複数のノードグループを追加できるようです。
今回は、同一EKSクラスタ内に、

(1)Fargateを保持しGPUなしの一般的なワークロードを実行、
(2)GPUノードからなるEC2インスタンスをノードグループとして追加してGPUを使うワークロードも実行可能にする

という２つの特性を備えたクラスタを作ってみます。初回の失敗と対策も記載しました。

`eksctl`のインストール

12/5現在0.11.0版が公開されましたので、こちらをダウンロードします。Macの場合は以下です。(ただし、これ以降の作業は12/3時点の0.11.0-rc.0を使用しました)

$ curl --silent --location "https://github.com/weaveworks/eksctl/releases/download/latest_release/eksctl_$(uname -s)_amd64.tar.gz" | tar xz -C /tmp
$ sudo mv /tmp/eksctl /usr/local/bin

その他インストール方法は公式レポジトリを参照ください。

バージョンを確認します。

$ eksctl version                                                                                                                                                                          
[ℹ]  version.Info{BuiltAt:"", GitCommit:"", GitTag:"0.11.0-rc.0"}

まずはFargateとEKSクラスタを構築

eksctl create cluster --name test --region us-east-1 --fargate                                                                                                                                    1 ↵
[ℹ]  eksctl version 0.11.0-rc.0
[ℹ]  using region us-east-1
[ℹ]  setting availability zones to [us-east-1a us-east-1f]
[ℹ]  subnets for us-east-1a - public:192.168.0.0/19 private:192.168.64.0/19
[ℹ]  subnets for us-east-1f - public:192.168.32.0/19 private:192.168.96.0/19
[ℹ]  using Kubernetes version 1.14
[ℹ]  creating EKS cluster "test" in "us-east-1" region with Fargate profile
[ℹ]  if you encounter any issues, check CloudFormation console or try 'eksctl utils describe-stacks --region=us-east-1 --cluster=test'
[ℹ]  CloudWatch logging will not be enabled for cluster "test" in "us-east-1"
[ℹ]  you can enable it with 'eksctl utils update-cluster-logging --region=us-east-1 --cluster=test'
[ℹ]  Kubernetes API endpoint access will use default of {publicAccess=true, privateAccess=false} for cluster "test" in "us-east-1"
[ℹ]  1 task: { create cluster control plane "test" }
[ℹ]  building cluster stack "eksctl-test-cluster"
[ℹ]  deploying stack "eksctl-test-cluster"
[✔]  all EKS cluster resources for "test" have been created
[✔]  saved kubeconfig as "/Users/xxxxxx/.kube/config"
[ℹ]  creating Fargate profile "fp-default" on EKS cluster "test"
[ℹ]  created Fargate profile "fp-default" on EKS cluster "test"
[ℹ]  "coredns" is now schedulable onto Fargate
[ℹ]  "coredns" is now scheduled onto Fargate
[ℹ]  "coredns" pods are now scheduled onto Fargate
[ℹ]  kubectl command should work with "/Users/xxxxxx/.kube/config", try 'kubectl get nodes'
[✔]  EKS cluster "test" in "us-east-1" region is ready

以下のようにEKSクラスタにFargate profileがマッピングされていますね。

GPUノードグループを追加

以下のコマンドで、p3.2xlargeインスタンス1台からなるノードグループをEKSクラスタに追加します。

$ eksctl create nodegroup --cluster test gpu-node-groups --nodes 1 --node-type p3.2xlarge
[ℹ]  eksctl version 0.11.0-rc.0
[ℹ]  using region us-east-1
[ℹ]  will use version 1.14 for new nodegroup(s) based on control plane version
[ℹ]  nodegroup "gpu-node-groups" will use "ami-xxxxxxxxxxxxxxx" [AmazonLinux2/1.14]
[ℹ]  1 nodegroup (gpu-node-groups) was included (based on the include/exclude rules)
[ℹ]  will create a CloudFormation stack for each of 1 nodegroups in cluster "test"
[ℹ]  2 parallel tasks: { create nodegroup "gpu-node-groups", no tasks }
[ℹ]  building nodegroup stack "eksctl-test-nodegroup-gpu-node-groups"
[ℹ]  --nodes-min=1 was set automatically for nodegroup gpu-node-groups
[ℹ]  --nodes-max=1 was set automatically for nodegroup gpu-node-groups
[ℹ]  deploying stack "eksctl-test-nodegroup-gpu-node-groups"
[ℹ]  adding identity "arn:aws:iam::xxxxxxxxxx:role/eksctl-test-nodegroup-gpu-node-gr-NodeInstanceRole-xxxxxxxx" to auth ConfigMap
[ℹ]  nodegroup "gpu-node-groups" has 0 node(s)
[ℹ]  waiting for at least 1 node(s) to become ready in "gpu-node-groups"
[ℹ]  nodegroup "gpu-node-groups" has 1 node(s)
[ℹ]  node "ip-192-168-6-6.ec2.internal" is ready
[ℹ]  as you are using a GPU optimized instance type you will need to install NVIDIA Kubernetes device plugin.
[ℹ]  	 see the following page for instructions: https://github.com/NVIDIA/k8s-device-plugin
[✔]  created 1 nodegroup(s) in cluster "test"
[✔]  created 0 managed nodegroup(s) in cluster "test"
[ℹ]  checking security group configuration for all nodegroups
[ℹ]  all nodegroups have up-to-date configuration

configファイルを作成してからコマンドを実行する方法もあります。以下を参照ください。

次に、出力に書いてあるように、k8s-device-pluginが必要なので、以下レポジトリの"Enabling GPU Support in Kubernetes"を参照してコマンドを実行します。

kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml

クラスタの状態を確認します。

$ kubectl get nodes \
"-o=custom-columns=NAME:.metadata.name,GPU:.status.allocatable.nvidia\.com/gpu"
NAME                                     GPU
fargate-ip-192-168-108-92.ec2.internal   <none>
fargate-ip-192-168-98-49.ec2.internal    <none>
ip-192-168-35-0.ec2.internal             1

最初の2つのnodeは、株式会社サイバーエージェント青山さんが以下の記事に記載の通りcoreDNS用のPodのためのnodeと思われます。

最後の1つはGPUノードですね。GPUが一つ見つかりました。

確認の方法は以下を参照しました。

Podをデプロイしてみる

せっかくできたクラスタに、(1)GPU不要な一般的なPod、(2)GPUを必要とするPod、をそれぞれ実行してみます。

(1) 一般的なPodをデプロイしてみる

$ kubectl create deploy nginx-test --image=nginx                                                                                                                            deployment.apps/nginx-test created

Fargateの特性みたいですが、Pod追加と同時にノードが増えるのですね、面白いです。

$ kubectl get nodes
NAME                                     STATUS   ROLES    AGE    VERSION
fargate-ip-192-168-108-92.ec2.internal   Ready    <none>   106m   v1.14.8-eks
fargate-ip-192-168-96-116.ec2.internal   Ready    <none>   36s    v1.14.8-eks
fargate-ip-192-168-98-49.ec2.internal    Ready    <none>   106m   v1.14.8-eks
ip-192-168-35-0.ec2.internal             Ready    <none>   98m    v1.14.7-eks-1861c5

(2) GPUを使うPodをデプロイしてみる

以下のようなPodを起動します。本当であればGPUノードにデプロイされてnvidia-smiの実行結果がログに出力されてほしいところです。

gpupod.yaml

apiVersion: v1
kind: Pod
metadata:
  name: nvidia-smi
spec:
  restartPolicy: OnFailure
  containers:
  - name: nvidia-smi
    image: nvidia/cuda:latest
    args:
    - "nvidia-smi"
    resources:
      limits:
        nvidia.com/gpu: 1

$ kubectl apply -f gpupod.yaml

本処理は6分ほど待ちます。

初回の結果：デプロイ失敗... 原因は?

kubectl get po -w
NAME                         READY   STATUS    RESTARTS   AGE
nginx-test-9876c7b6c-fcpwj   1/1     Running   0          4m19s
nvidia-smi                   0/1     Pending   0          5s
NAME                         READY   STATUS    RESTARTS   AGE
nvidia-smi                   0/1     Pending   0          68s
nvidia-smi                   0/1     ContainerCreating   0          68s
nvidia-smi                   0/1     RunContainerError   0          6m1s
nvidia-smi                   0/1     RunContainerError   1          6m2s
nvidia-smi                   0/1     CrashLoopBackOff    1          6m3s

なぜnvidia-smiは起動失敗するかというと、せっかく用意したGPUノードグループではなくGPUがないFargateにデプロイされてしまうことが原因でした。GPUがない環境で動かすとデバイスが見えず不正終了してしまうのです。

原因は私が指定したnamespaceでした。以下を見てください。eksctl create cluster --fargateで作成したFargate Profileには以下のようにdefault, kube-systemと書かれています。

EKSからFargateを使うときには、Fargate Profileに「どういうPodをFargateに割り付けるか」という条件を書く必要があり、デフォルトではdefault, kube-systemになっています。re:Invent 2019の"Deep dive: Fargate under the hood"セッションでは、EKSにwebhookを独自に追加し、その内部で予めどちらに振り分けるかを決定している、と説明がありました。

私の場合は、GPUを使うPodも何も考えずにdefaultにデプロイしたために、Fargateにデプロイされていたわけです。

解決策

namespaceを条件としてFargateに割り付けているので、その条件を外すようにnamespaceを新規作成して、デプロイ先に指定します。

$ kubectl create namespace gpu-jobs
$ kubectl apply -f gpupod.yaml --namespace gpu-jobs

$ kubectl logs nvidia-smi -n gpu-jobs
Thu Dec  5 01:25:18 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla V100-SXM2...  On   | 00000000:00:1E.0 Off |                    0 |
| N/A   30C    P0    24W / 300W |      0MiB / 16130MiB |      1%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

無事起動し、コンテナ内でちゃんとGPUが見えますね。

削除

以下コマンドで、Fargateやノードグループを一括で削除可能です。

eksctl delete cluster --name test

まとめ

Amazon EKSから、コンテナサービスFargateを利用しスケーラビリティを確保しつつ、GPUも使えるようにしました。具体的には、Fargateとp3.2xlargeインスタンスのEC2のノードグループをEKSに接続し、それぞれにPodをデプロイすることを確認しました。namespace,ラベルでのデプロイ先割付がなされました。単純なリソース要求量(Podのrequirements)でのスケジューリング以外の要素として考慮する必要があることが分かりました。helmチャートなどをデプロイする人はどちらにデプロイされるかを意識することが必要かもしれません。

AWS, EC2は、Amazon Web Services, Inc. の米国または他の国における商標登録または商標です。
Kubernetesは、The Linux Foundationの米国または他の国における商標登録または商標です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up