More than 5 years have passed since last update.

Pod の問題が起きた場合の確認方法を色々やってみる

Last updated at 2018-12-26Posted at 2018-12-26

作業メモ。
Pod で問題がある場合にどのような調査方法があるか確認する。

Kubernetes完全ガイド impress top gearシリーズを読みながら手元で確認した時のメモ。

公式ドキュメントだと以下が役に立ちそう。

環境

$ kubectl version
Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-28T20:03:09Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-eks-6bad6d", GitCommit:"6bad6d9c768dc0864dab48a11653aa53b5a47043", GitTreeState:"clean", BuildDate:"2018-12-06T23:13:14Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}

kubectl logs でログを確認する

Interacting with running Pods

kubectl logs my-pod # dump pod logs (stdout)
kubectl logs my-pod --previous # dump pod logs (stdout) for a previous instantiation of a container
kubectl logs my-pod -c my-container # dump pod container logs (stdout, multi-container case)
kubectl logs my-pod -c my-container --previous # dump pod container logs (stdout, multi-container case) for a previous instantiation of a container
kubectl logs -f my-pod # stream pod logs (stdout)
kubectl logs -f my-pod -c my-container # stream pod container logs (stdout, multi-container case)

kubectl logsによって Pod やコンテナのログを見ることが出来る。

# Redis master Pod のログを見る
$kubectl logs redis-master-99dx5
[1] 20 Dec 02:43:13.590 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
                _._
           _.-``__ ''-._
      _.-``    `.  `_.  ''-._           Redis 2.8.23 (00000000/0) 64 bit
  .-`` .-```.  ```\/    _.,_ ''-._
 (    '      ,       .-`  | `,    )     Running in stand alone mode
 |`-._`-...-` __...-.``-._|'` _.-'|     Port: 6379
 |    `-._   `._    /     _.-'    |     PID: 1
  `-._    `-._  `-./  _.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |           http://redis.io
  `-._    `-._`-.__.-'_.-'    _.-'
 |`-._`-._    `-.__.-'    _.-'_.-'|
 |    `-._`-._        _.-'_.-'    |
  `-._    `-._`-.__.-'_.-'    _.-'
      `-._    `-.__.-'    _.-'
          `-._        _.-'
              `-.__.-'

Pod に複数のコンテナが内包されており、特定のコンテナログのみ見たい場合には-c [container name]という形式でオプションを付与してコンテナを指定出来る。

-fオプションでストリーム(tail -f のようなもの）も可能。

kubectl describe で Node のイベントやリソース割当状況を確認する

kubectl describeで Node/Pod を指定すると詳細が確認できる。

Pod

Pod 作成直後見てみる

# pod 一覧
$ kubectl get pods
NAME                 READY     STATUS    RESTARTS   AGE
1-sample-pod         1/1       Running   0          2h
guestbook-9lxmq      1/1       Running   0          5d
guestbook-ddskb      1/1       Running   0          5d
guestbook-vrvnr      1/1       Running   0          5d
redis-master-99dx5   1/1       Running   0          5d
redis-slave-lwtzm    1/1       Running   0          5d
redis-slave-xbfb2    1/1       Running   0          5d

# pod の詳細情報を見る
$kubectl describe pod 1-sample-pod
Name:               1-sample-pod
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               ip-172-31-19-51.ap-northeast-1.compute.internal/172.31.19.51
Start Time:         Tue, 25 Dec 2018 17:05:09 +0900
Labels:             <none>
Annotations:        kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"1-sample-pod","namespace":"default"},"spec":{"containers":[{"image":"nginx:1.12","...
Status:             Running
IP:                 172.31.31.101
Containers:
  nginx-container:
    Container ID:   docker://4f7117a8ce6adb26e2b82e9f4b419d2541f311d5bc20566cf658f886f7e9321a
    Image:          nginx:1.12
    Image ID:       docker-pullable://nginx@sha256:72daaf46f11cc753c4eab981cbf869919bd1fee3d2170a2adeac12400f494728
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Tue, 25 Dec 2018 17:05:09 +0900
    Ready:          True
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9shpr (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             True
  ContainersReady   True
  PodScheduled      True
Volumes:
  default-token-9shpr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9shpr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type    Reason     Age   From                                                      Message
  ----    ------     ----  ----                                                      -------
  Normal  Scheduled  23s   default-scheduler                                         Successfully assigned default/1-sample-pod to ip-172-31-19-51.ap-northeast-1.compute.internal
  Normal  Pulled     23s   kubelet, ip-172-31-19-51.ap-northeast-1.compute.internal  Container image "nginx:1.12" already present on machine
  Normal  Created    23s   kubelet, ip-172-31-19-51.ap-northeast-1.compute.internal  Created container
  Normal  Started    23s   kubelet, ip-172-31-19-51.ap-northeast-1.compute.internal  Started container

Events で nginx　コンテナイメージは既に存在していたこと、その後コンテナの作成・起動が実施された事が分かる。

エラー時の挙動を確認するために、存在しないイメージを指定して Pod を作成してみる。

# 適用.存在しないイメージを指定した Pod のマニュフェストファイルを作成する
$kubectl apply -f bad-pod.yaml
pod "bad-pod" created

# pod が作成されるが、Image pull で失敗して runningのコンテナが無いことが分かる
$ kubectl get pod |grep bad-pod
bad-pod              0/1       ErrImagePull   0          34s

# describe してみる
$kubectl describe pod bad-pod
Name:               bad-pod
Namespace:          default
Priority:           0
PriorityClassName:  <none>
Node:               ip-172-31-23-75.ap-northeast-1.compute.internal/172.31.23.75
Start Time:         Wed, 26 Dec 2018 09:38:43 +0900
Labels:             <none>
Annotations:        kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"bad-pod","namespace":"default"},"spec":{"containers":[{"image":"nnnnnnnnnnginx:1.1...
Status:             Pending
IP:                 172.31.24.230
Containers:
  nginx-container:
    Container ID:   
    Image:          nnnnnnnnnnginx:1.12
    Image ID:       
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-9shpr (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
Volumes:
  default-token-9shpr:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  default-token-9shpr
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     node.kubernetes.io/not-ready:NoExecute for 300s
                 node.kubernetes.io/unreachable:NoExecute for 300s
Events:
  Type     Reason     Age               From                                                      Message
  ----     ------     ----              ----                                                      -------
  Normal   Scheduled  1m                default-scheduler                                         Successfully assigned default/bad-pod to ip-172-31-23-75.ap-northeast-1.compute.internal
  Normal   BackOff    28s (x4 over 1m)  kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal  Back-off pulling image "nnnnnnnnnnginx:1.12"
  Warning  Failed     28s (x4 over 1m)  kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal  Error: ImagePullBackOff
  Normal   Pulling    15s (x4 over 1m)  kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal  pulling image "nnnnnnnnnnginx:1.12"
  Warning  Failed     13s (x4 over 1m)  kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal  Failed to pull image "nnnnnnnnnnginx:1.12": rpc error: code = Unknown desc = Error response from daemon: repository nnnnnnnnnnginx not found: does not exist or no pull access
  Warning  Failed     13s (x4 over 1m)  kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal  Error: ErrImagePull

Evenst で状況が確認出来る。
イメージ「nnnnnnnnnnginx:1.12」が存在せず、エラーとなっていることが分かる。

Node

kubectl describeで Node を指定すると Node のイベントやリソース利用状況が分かる。

# node 一覧
$kubectl get nodes
NAME                                              STATUS    ROLES     AGE       VERSION
ip-172-31-0-56.ap-northeast-1.compute.internal    Ready     <none>    5d        v1.11.5
ip-172-31-19-51.ap-northeast-1.compute.internal   Ready     <none>    5d        v1.11.5
ip-172-31-23-75.ap-northeast-1.compute.internal   Ready     <none>    5d        v1.11.5

# 一つの Node で describe 
$kubectl describe node ip-172-31-0-56.ap-northeast-1.compute.internal

Name:               ip-172-31-0-56.ap-northeast-1.compute.internal
Roles:              <none>
Labels:             beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=t3.medium
                    beta.kubernetes.io/os=linux
                    failure-domain.beta.kubernetes.io/region=ap-northeast-1
                    failure-domain.beta.kubernetes.io/zone=ap-northeast-1c
                    kubernetes.io/hostname=ip-172-31-0-56.ap-northeast-1.compute.internal
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp:  Thu, 20 Dec 2018 11:41:56 +0900
Taints:             <none>
Unschedulable:      false
Conditions:
  Type             Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----             ------  -----------------                 ------------------                ------                       -------
  OutOfDisk        False   Tue, 25 Dec 2018 17:00:43 +0900   Thu, 20 Dec 2018 11:41:56 +0900   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure   False   Tue, 25 Dec 2018 17:00:43 +0900   Thu, 20 Dec 2018 11:41:56 +0900   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure     False   Tue, 25 Dec 2018 17:00:43 +0900   Thu, 20 Dec 2018 11:41:56 +0900   KubeletHasNoDiskPressure     kubelet has no disk pressure
  PIDPressure      False   Tue, 25 Dec 2018 17:00:43 +0900   Thu, 20 Dec 2018 11:41:56 +0900   KubeletHasSufficientPID      kubelet has sufficient PID available
  Ready            True    Tue, 25 Dec 2018 17:00:43 +0900   Thu, 20 Dec 2018 11:42:16 +0900   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:   172.31.0.56
  ExternalIP:   13.113.212.75
  InternalDNS:  ip-172-31-0-56.ap-northeast-1.compute.internal
  ExternalDNS:  ec2-13-113-212-75.ap-northeast-1.compute.amazonaws.com
  Hostname:     ip-172-31-0-56.ap-northeast-1.compute.internal
Capacity:
 cpu:                2
 ephemeral-storage:  20959212Ki
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3980356Ki
 pods:               17
Allocatable:
 cpu:                2
 ephemeral-storage:  19316009748
 hugepages-1Gi:      0
 hugepages-2Mi:      0
 memory:             3877956Ki
 pods:               17
System Info:
 Machine ID:                 ec2343c167f56384eea0f03087ffca0d
 System UUID:                EC2343C1-67F5-6384-EEA0-F03087FFCA0D
 Boot ID:                    857b4e69-b79c-4039-a8d0-e50db7fa36bd
 Kernel Version:             4.14.77-81.59.amzn2.x86_64
 OS Image:                   Amazon Linux 2
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://17.6.2
 Kubelet Version:            v1.11.5
 Kube-Proxy Version:         v1.11.5
ExternalID:                  ip-172-31-0-56.ap-northeast-1.compute.internal
ProviderID:                  aws:///ap-northeast-1c/i-0d28342987f4771f7
Non-terminated Pods:         (6 in total)
  Namespace                  Name                        CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ---------                  ----                        ------------  ----------  ---------------  -------------
  default                    guestbook-ddskb             0 (0%)        0 (0%)      0 (0%)           0 (0%)
  default                    redis-slave-lwtzm           0 (0%)        0 (0%)      0 (0%)           0 (0%)
  kube-system                aws-node-kwjjw              10m (0%)      0 (0%)      0 (0%)           0 (0%)
  kube-system                coredns-7774b7957b-cxkst    100m (5%)     0 (0%)      70Mi (1%)        170Mi (4%)
  kube-system                coredns-7774b7957b-pjsp2    100m (5%)     0 (0%)      70Mi (1%)        170Mi (4%)
  kube-system                kube-proxy-dm2qm            100m (5%)     0 (0%)      0 (0%)           0 (0%)
Allocated resources:
  (Total limits may be over 100 percent, i.e., overcommitted.)
  CPU Requests  CPU Limits  Memory Requests  Memory Limits
  ------------  ----------  ---------------  -------------
  310m (15%)    0 (0%)      140Mi (3%)       340Mi (8%)
Events:         <none>

Pod に割り合てているリソース情報が確認出来た。
また、対象 Node で動いている Pod が分かるように見える。
SSH して確認してみる

# 上で記載のあった Pod が動いている
$ docker ps
CONTAINER ID        IMAGE                                                                   COMMAND                  CREATED             STATUS              PORTS               NAMES
c3e0e1a93e68        k8s.gcr.io/guestbook                                                    "./guestbook"            5 days ago          Up 5 days                               k8s_guestbook_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0
c388aba9348c        kubernetes/redis-slave                                                  "/bin/sh -c /run.sh"     5 days ago          Up 5 days                               k8s_redis-slave_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0
67637892c25b        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0
55a53cfdcf8e        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0
e07e3eb18c04        cfebd7b9d0f4                                                            "/coredns -conf /e..."   5 days ago          Up 5 days                               k8s_coredns_coredns-7774b7957b-pjsp2_kube-system_c6bff361-03fb-11e9-b090-0ae6cc179478_0
2c8f984ff0aa        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_coredns-7774b7957b-pjsp2_kube-system_c6bff361-03fb-11e9-b090-0ae6cc179478_1
26196975bdc8        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/coredns           "/coredns -conf /e..."   5 days ago          Up 5 days                               k8s_coredns_coredns-7774b7957b-cxkst_kube-system_c6c13cf5-03fb-11e9-b090-0ae6cc179478_0
b88e35d9201d        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_coredns-7774b7957b-cxkst_kube-system_c6c13cf5-03fb-11e9-b090-0ae6cc179478_3
5e3b3d3f86b8        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon-k8s-cni        "/bin/sh -c /app/i..."   5 days ago          Up 5 days                               k8s_aws-node_aws-node-kwjjw_kube-system_d105948b-0400-11e9-b090-0ae6cc179478_1
cfd84580c44d        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy        "/bin/sh -c 'kube-..."   5 days ago          Up 5 days                               k8s_kube-proxy_kube-proxy-dm2qm_kube-system_d105696e-0400-11e9-b090-0ae6cc179478_0
ee17b851d4ff        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_aws-node-kwjjw_kube-system_d105948b-0400-11e9-b090-0ae6cc179478_0
a8e4be169511        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_kube-proxy-dm2qm_kube-system_d105696e-0400-11e9-b090-0ae6cc179478_0

# guestbook のコンテナ
$docker ps |grep guest
c3e0e1a93e68        k8s.gcr.io/guestbook                                                    "./guestbook"            5 days ago          Up 5 days                               k8s_guestbook_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0
67637892c25b        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0

# redis-slave コンテナ
$docker ps |grep slave
c388aba9348c        kubernetes/redis-slave                                                  "/bin/sh -c /run.sh"     5 days ago          Up 5 days                               k8s_redis-slave_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0
55a53cfdcf8e        602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1   "/pause"                 5 days ago          Up 5 days                               k8s_POD_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0

kubectl exec で Pod 上でのコマンドを実行する

Interacting with running Pods

kubectl exec my-pod -- ls / # Run command in existing pod (1 container case)
kubectl exec my-pod -c my-container -- ls / # Run command in existing pod (multi-container case)

実際にやってみる。

$kubectl get pods |grep sample
1-sample-pod         1/1       Running   0          17h
2-sample-pod         1/1       Running   0          17h

# 1-sample-pod でコマンド(/bin/sh)を実行する。シェルが起動され、touchコマンドでファイルを作る
$kubectl exec -it 1-sample-pod /bin/sh
# touch /tmp/test.txt
# exit

# 再度 exec コマンドを実行する。ls コマンドを実行する
$kubectl exec 1-sample-pod ls /tmp/
test.txt

一つの Pod で複数のコンテナを内包する場合、-c [container name]と指定することで指定したコンテナでコマンドを実行できる。

kubectl top で Pod のコンテナが使用しているリソース状況を確認する

kubeclt descirbeで確認できるのは Pod に確保したリソースとなる。
そのため、kubectl topコマンドを利用して実際に使っているリソース利用状況を確認する。

Interacting with running Pods

kubectl top pod POD_NAME --containers # Show metrics for a given pod and its containers

やってみたが、NG...

$kubectl top pod 1-sample-pod
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

$kubectl top nodes
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

これは EKS の制約かもしれないが、一旦保留。

kubectl port-forward でローカルマシンからアクセスする

Interacting with running Pods

kubectl port-forward my-pod 5000:6000 # Listen on port 5000 on the local machine and forward to port 6000 on my-pod

Pod で起動している nginx コンテナに port-forward してアクセス出来るようにする。

# ローカルホストの 8888 を Pod の 80番ポートに転送
$kubectl port-forward 1-sample-pod 8888:80
Forwarding from 127.0.0.1:8888 -> 80
Forwarding from [::1]:8888 -> 80
Handling connection for 8888

# 別途シェルを起動.curl でアクセスできた
$curl http://localhost:8888
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
    body {
        width: 35em;
        margin: 0 auto;
        font-family: Tahoma, Verdana, Arial, sans-serif;
    }
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

# kubectl logs でアクセスログを確認し、アクセスがある事が確認出来た
$ kubectl logs 1-sample-pod
127.0.0.1 - - [26/Dec/2018:02:45:58 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.43.0" "-"

kubectl run でコンテナのシェル上で確認する

Interacting with running Pods

kubectl run -i --tty busybox --image=busybox -- sh # Run pod as interactive shell

例えば Pod 作成時に一時的にコンテナは起動するものの一定時間経過によってコンテナが終了してしまう場合、kubectl execが使えない。
kubectl runであれば docker runのようにコンテナを起動する事が出来る。

# kubectl run で Pod を起動
$ kubectl run -i --tty busybox --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ #


# 別シェルを起動して確認.kubectl run コマンドによってコンテナが Pod として起動している事が分かる
$ kubectl get pods |grep busy
busybox-74db8b6768-x4sfv   1/1       Running   0          54s

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up