Help us understand the problem. What is going on with this article?

k8s用機械学習基盤kubeflow インストール

More than 1 year has passed since last update.

kubernetes用機械学習基盤「kubeflow」のセットアップメモ

参考)
https://www.kubeflow.org/docs/started/getting-started/

事前準備

要件

・ksonnet version 0.11.0 or later.
・Kubernetes 1.8 or later
・kubectl
・0.6 CPU 以上/クラスタ
・10 GB以上のストレージ/node

①k8sクラスタ構築

https://qiita.com/suzukihi724/items/389f1a51bd89672697b3

②ksonnetインストール

https://qiita.com/suzukihi724/items/4bd7f36ef0dc0538683c

kubeflowインストール

①ディレクトリ準備

・KUBEFLOW_SRC : $HOME/kubeflow
・KFAPP : /etc/kubeflow

②必要なパスを追加。

$HOME/.bash_profileに以下を追加し有効化。

export KUBEFLOW_SRC="$HOME/kubeflow"
export KFAPP="/etc/kubeflow"

③kubeflowをダウンロード

https://github.com/kubeflow/kubeflow

cd ${KUBEFLOW_SRC}
git clone https://github.com/kubeflow/kubeflow.git

④kubeflowデプロイ

${KUBEFLOW_SRC}/scripts/kfctl.sh init ${KFAPP} --platform none
cd ${KFAPP}
${KUBEFLOW_SRC}/scripts/kfctl.sh generate k8s
${KUBEFLOW_SRC}/scripts/kfctl.sh apply k8s

$KFAPP ディレクトリは、非rootユーザでもアクセスできるよう権限を与えておく

⑤確認

エラーなく完了すると、「kubeflow」というnamespaceに諸々podが生成される。

$ kubectl get pod --namespace kubeflow

NAME                                                      READY   STATUS             RESTARTS   AGE
ambassador-7b8477f667-cdqd7                               1/1     Running            0          49m
ambassador-7b8477f667-klkxb                               1/1     Running            0          49m
ambassador-7b8477f667-x5jvs                               1/1     Running            0          49m
argo-ui-fb67b6bc8-282v4                                   1/1     Running            0          49m
centraldashboard-798f8d68d5-7wz2c                         1/1     Running            0          49m
jupyter-0                                                 1/1     Running            0          49m
katib-ui-5dd9b4967b-lq8p5                                 1/1     Running            0          48m
metacontroller-0                                          1/1     Running            0          49m
minio-869c7c66cf-gh2qh                                    0/1     Running            0          48m
ml-pipeline-564fffbbcb-xmr4n                              1/1     Running            10         48m
ml-pipeline-persistenceagent-68bcdfddb9-rflgr             0/1     Running            9          48m
ml-pipeline-scheduledworkflow-b67c47f7f-s95ff             1/1     Running            0          48m
ml-pipeline-ui-7959bb7f47-j9flp                           1/1     Running            0          48m
mysql-5d5b5475c4-tn4h7                                    0/1     Pending            0          48m
spartakus-volunteer-64447775fb-2rht4                      1/1     Running            0          49m
studyjob-controller-8699fdffc4-2tlb5                      1/1     Running            0          48m
tf-job-dashboard-9b466bbcf-szsk8                          1/1     Running            0          49m
tf-job-operator-v1beta1-54784b9575-w7hvw                  1/1     Running            0          49m
vizier-core-64d57f5646-p84n5                              1/1     Running            10         48m
vizier-core-rest-79cfd59cfb-8vwqq                         1/1     Running            0          48m
vizier-db-6bd6c6fdd5-qs5b6                                0/1     Running            0          48m
vizier-suggestion-bayesianoptimization-6bff58f988-4grx8   1/1     Running            0          48m
vizier-suggestion-grid-5fdf88445d-hsbrq                   1/1     Running            0          48m
vizier-suggestion-hyperband-85bffb65cd-qc284              1/1     Running            0          48m
vizier-suggestion-random-ff54d4bd8-6lc6h                  1/1     Running            0          48m
workflow-controller-d5cb6468d-zff6x                       1/1     Running            0          49m

その他

いろいろはまったポイント

・kfctl.sh実行時に env.sh: No such file or directory エラー。
→実行場所が${KFAPP}じゃないことが原因

・kfctl.sh実行時に connection timeoutエラー。
→サーバースペック不足。t3.small相当からm5.large相当に変更したら解消

・kfctl.sh実行時に patching object from cluster: merging object with existing state: unable to recognizeエラー。
→なんどか失敗したのでnamespaceのゴミがあった。kubectl delete namespace kubeflowで解消

Why do not you register as a user and use Qiita more conveniently?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away