Pod の問題が起きた場合の確認方法を色々やってみる

作業メモ。

Pod で問題がある場合にどのような調査方法があるか確認する。

Kubernetes完全ガイド impress top gearシリーズを読みながら手元で確認した時のメモ。

公式ドキュメントだと以下が役に立ちそう。

Interacting with running Pods


環境

$ kubectl version

Client Version: version.Info{Major:"1", Minor:"10", GitVersion:"v1.10.3", GitCommit:"2bba0127d85d5a46ab4b778548be28623b32d0b0", GitTreeState:"clean", BuildDate:"2018-05-28T20:03:09Z", GoVersion:"go1.9.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"11+", GitVersion:"v1.11.5-eks-6bad6d", GitCommit:"6bad6d9c768dc0864dab48a11653aa53b5a47043", GitTreeState:"clean", BuildDate:"2018-12-06T23:13:14Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}


kubectl logs でログを確認する

Interacting with running Pods


kubectl logs my-pod # dump pod logs (stdout)

kubectl logs my-pod --previous # dump pod logs (stdout) for a previous instantiation of a container

kubectl logs my-pod -c my-container # dump pod container logs (stdout, multi-container case)

kubectl logs my-pod -c my-container --previous # dump pod container logs (stdout, multi-container case) for a previous instantiation of a container

kubectl logs -f my-pod # stream pod logs (stdout)

kubectl logs -f my-pod -c my-container # stream pod container logs (stdout, multi-container case)


kubectl logsによって Pod やコンテナのログを見ることが出来る。

# Redis master Pod のログを見る

$kubectl logs redis-master-99dx5
[1] 20 Dec 02:43:13.590 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
_._
_.-``__ ''-._
_.-`` `. `_. ''-._ Redis 2.8.23 (00000000/0) 64 bit
.-`` .-```. ```\/ _.,_ ''-._
( ' , .-` | `, ) Running in stand alone mode
|`-._`-...-` __...-.``-._|'
` _.-'| Port: 6379
| `-._ `._ / _.-'
| PID: 1
`-._ `-._ `-./ _.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'
_.-' | http://redis.io
`-._ `-._`-.__.-'
_.-' _.-'
|`-._`-._ `-.__.-' _.-'_.-'|
| `-._`-._ _.-'
_.-' |
`-._ `-._`-.__.-'
_.-' _.-'
`-._ `-.__.-' _.-'
`-._ _.-'
`-.__.-'

Pod に複数のコンテナが内包されており、特定のコンテナログのみ見たい場合には-c [container name]という形式でオプションを付与してコンテナを指定出来る。

-fオプションでストリーム(tail -f のようなもの)も可能。


kubectl describe で Node のイベントやリソース割当状況を確認する

kubectl describeで Node/Pod を指定すると詳細が確認できる。


Pod

Pod 作成直後見てみる

# pod 一覧

$ kubectl get pods
NAME READY STATUS RESTARTS AGE
1-sample-pod 1/1 Running 0 2h
guestbook-9lxmq 1/1 Running 0 5d
guestbook-ddskb 1/1 Running 0 5d
guestbook-vrvnr 1/1 Running 0 5d
redis-master-99dx5 1/1 Running 0 5d
redis-slave-lwtzm 1/1 Running 0 5d
redis-slave-xbfb2 1/1 Running 0 5d

# pod の詳細情報を見る
$kubectl describe pod 1-sample-pod
Name: 1-sample-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: ip-172-31-19-51.ap-northeast-1.compute.internal/172.31.19.51
Start Time: Tue, 25 Dec 2018 17:05:09 +0900
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"1-sample-pod","namespace":"default"},"spec":{"containers":[{"image":"nginx:1.12","...
Status: Running
IP: 172.31.31.101
Containers:
nginx-container:
Container ID: docker://4f7117a8ce6adb26e2b82e9f4b419d2541f311d5bc20566cf658f886f7e9321a
Image: nginx:1.12
Image ID: docker-pullable://nginx@sha256:72daaf46f11cc753c4eab981cbf869919bd1fee3d2170a2adeac12400f494728
Port: <none>
Host Port: <none>
State: Running
Started: Tue, 25 Dec 2018 17:05:09 +0900
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9shpr (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-9shpr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9shpr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 23s default-scheduler Successfully assigned default/1-sample-pod to ip-172-31-19-51.ap-northeast-1.compute.internal
Normal Pulled 23s kubelet, ip-172-31-19-51.ap-northeast-1.compute.internal Container image "
nginx:1.12" already present on machine
Normal Created 23s kubelet, ip-172-31-19-51.ap-northeast-1.compute.internal Created container
Normal Started 23s kubelet, ip-172-31-19-51.ap-northeast-1.compute.internal Started container

Events で nginx コンテナイメージは既に存在していたこと、その後コンテナの作成・起動が実施された事が分かる。

エラー時の挙動を確認するために、存在しないイメージを指定して Pod を作成してみる。

# 適用.存在しないイメージを指定した Pod のマニュフェストファイルを作成する

$kubectl apply -f bad-pod.yaml
pod "bad-pod" created

# pod が作成されるが、Image pull で失敗して runningのコンテナが無いことが分かる
$ kubectl get pod |grep bad-pod
bad-pod 0/1 ErrImagePull 0 34s

# describe してみる
$kubectl describe pod bad-pod
Name: bad-pod
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: ip-172-31-23-75.ap-northeast-1.compute.internal/172.31.23.75
Start Time: Wed, 26 Dec 2018 09:38:43 +0900
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Pod","metadata":{"annotations":{},"name":"bad-pod","namespace":"default"},"spec":{"containers":[{"image":"nnnnnnnnnnginx:1.1...
Status: Pending
IP: 172.31.24.230
Containers:
nginx-container:
Container ID:
Image: nnnnnnnnnnginx:1.12
Image ID:
Port: <none>
Host Port: <none>
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-9shpr (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
default-token-9shpr:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-9shpr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 1m default-scheduler Successfully assigned default/bad-pod to ip-172-31-23-75.ap-northeast-1.compute.internal
Normal BackOff 28s (x4 over 1m) kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal Back-off pulling image "
nnnnnnnnnnginx:1.12"
Warning Failed 28s (x4 over 1m) kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal Error: ImagePullBackOff
Normal Pulling 15s (x4 over 1m) kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal pulling image "
nnnnnnnnnnginx:1.12"
Warning Failed 13s (x4 over 1m) kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal Failed to pull image "
nnnnnnnnnnginx:1.12": rpc error: code = Unknown desc = Error response from daemon: repository nnnnnnnnnnginx not found: does not exist or no pull access
Warning Failed 13s (x4 over 1m) kubelet, ip-172-31-23-75.ap-northeast-1.compute.internal Error: ErrImagePull

Evenst で状況が確認出来る。

イメージ「nnnnnnnnnnginx:1.12」が存在せず、エラーとなっていることが分かる。


Node

kubectl describeで Node を指定すると Node のイベントやリソース利用状況が分かる。

# node 一覧

$kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-172-31-0-56.ap-northeast-1.compute.internal Ready <none> 5d v1.11.5
ip-172-31-19-51.ap-northeast-1.compute.internal Ready <none> 5d v1.11.5
ip-172-31-23-75.ap-northeast-1.compute.internal Ready <none> 5d v1.11.5

# 一つの Node で describe
$kubectl describe node ip-172-31-0-56.ap-northeast-1.compute.internal

Name: ip-172-31-0-56.ap-northeast-1.compute.internal
Roles: <none>
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=t3.medium
beta.kubernetes.io/os=linux
failure-domain.beta.kubernetes.io/region=ap-northeast-1
failure-domain.beta.kubernetes.io/zone=ap-northeast-1c
kubernetes.io/hostname=ip-172-31-0-56.ap-northeast-1.compute.internal
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
CreationTimestamp: Thu, 20 Dec 2018 11:41:56 +0900
Taints: <none>
Unschedulable: false
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
OutOfDisk False Tue, 25 Dec 2018 17:00:43 +0900 Thu, 20 Dec 2018 11:41:56 +0900 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Tue, 25 Dec 2018 17:00:43 +0900 Thu, 20 Dec 2018 11:41:56 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Tue, 25 Dec 2018 17:00:43 +0900 Thu, 20 Dec 2018 11:41:56 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Tue, 25 Dec 2018 17:00:43 +0900 Thu, 20 Dec 2018 11:41:56 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Tue, 25 Dec 2018 17:00:43 +0900 Thu, 20 Dec 2018 11:42:16 +0900 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 172.31.0.56
ExternalIP: 13.113.212.75
InternalDNS: ip-172-31-0-56.ap-northeast-1.compute.internal
ExternalDNS: ec2-13-113-212-75.ap-northeast-1.compute.amazonaws.com
Hostname: ip-172-31-0-56.ap-northeast-1.compute.internal
Capacity:
cpu: 2
ephemeral-storage: 20959212Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3980356Ki
pods: 17
Allocatable:
cpu: 2
ephemeral-storage: 19316009748
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 3877956Ki
pods: 17
System Info:
Machine ID: ec2343c167f56384eea0f03087ffca0d
System UUID: EC2343C1-67F5-6384-EEA0-F03087FFCA0D
Boot ID: 857b4e69-b79c-4039-a8d0-e50db7fa36bd
Kernel Version: 4.14.77-81.59.amzn2.x86_64
OS Image: Amazon Linux 2
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.6.2
Kubelet Version: v1.11.5
Kube-Proxy Version: v1.11.5
ExternalID: ip-172-31-0-56.ap-northeast-1.compute.internal
ProviderID: aws:///ap-northeast-1c/i-0d28342987f4771f7
Non-terminated Pods: (6 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default guestbook-ddskb 0 (0%) 0 (0%) 0 (0%) 0 (0%)
default redis-slave-lwtzm 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system aws-node-kwjjw 10m (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system coredns-7774b7957b-cxkst 100m (5%) 0 (0%) 70Mi (1%) 170Mi (4%)
kube-system coredns-7774b7957b-pjsp2 100m (5%) 0 (0%) 70Mi (1%) 170Mi (4%)
kube-system kube-proxy-dm2qm 100m (5%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
310m (15%) 0 (0%) 140Mi (3%) 340Mi (8%)
Events: <none>

Pod に割り合てているリソース情報が確認出来た。

また、対象 Node で動いている Pod が分かるように見える。

SSH して確認 してみる

# 上で記載のあった Pod が動いている

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
c3e0e1a93e68 k8s.gcr.io/guestbook "./guestbook" 5 days ago Up 5 days k8s_guestbook_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0
c388aba9348c kubernetes/redis-slave "/bin/sh -c /run.sh" 5 days ago Up 5 days k8s_redis-slave_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0
67637892c25b 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0
55a53cfdcf8e 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0
e07e3eb18c04 cfebd7b9d0f4 "/coredns -conf /e..." 5 days ago Up 5 days k8s_coredns_coredns-7774b7957b-pjsp2_kube-system_c6bff361-03fb-11e9-b090-0ae6cc179478_0
2c8f984ff0aa 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_coredns-7774b7957b-pjsp2_kube-system_c6bff361-03fb-11e9-b090-0ae6cc179478_1
26196975bdc8 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/coredns "/coredns -conf /e..." 5 days ago Up 5 days k8s_coredns_coredns-7774b7957b-cxkst_kube-system_c6c13cf5-03fb-11e9-b090-0ae6cc179478_0
b88e35d9201d 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_coredns-7774b7957b-cxkst_kube-system_c6c13cf5-03fb-11e9-b090-0ae6cc179478_3
5e3b3d3f86b8 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/amazon-k8s-cni "/bin/sh -c /app/i..." 5 days ago Up 5 days k8s_aws-node_aws-node-kwjjw_kube-system_d105948b-0400-11e9-b090-0ae6cc179478_1
cfd84580c44d 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/kube-proxy "/bin/sh -c 'kube-..." 5 days ago Up 5 days k8s_kube-proxy_kube-proxy-dm2qm_kube-system_d105696e-0400-11e9-b090-0ae6cc179478_0
ee17b851d4ff 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_aws-node-kwjjw_kube-system_d105948b-0400-11e9-b090-0ae6cc179478_0
a8e4be169511 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_kube-proxy-dm2qm_kube-system_d105696e-0400-11e9-b090-0ae6cc179478_0

# guestbook のコンテナ
$docker ps |grep guest
c3e0e1a93e68 k8s.gcr.io/guestbook "./guestbook" 5 days ago Up 5 days k8s_guestbook_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0
67637892c25b 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_guestbook-ddskb_default_0b96445b-0401-11e9-b090-0ae6cc179478_0

# redis-slave コンテナ
$docker ps |grep slave
c388aba9348c kubernetes/redis-slave "/bin/sh -c /run.sh" 5 days ago Up 5 days k8s_redis-slave_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0
55a53cfdcf8e 602401143452.dkr.ecr.ap-northeast-1.amazonaws.com/eks/pause-amd64:3.1 "/pause" 5 days ago Up 5 days k8s_POD_redis-slave-lwtzm_default_05b988f0-0401-11e9-b090-0ae6cc179478_0


kubectl exec で Pod 上でのコマンドを実行する

Interacting with running Pods


kubectl exec my-pod -- ls / # Run command in existing pod (1 container case)

kubectl exec my-pod -c my-container -- ls / # Run command in existing pod (multi-container case)


実際にやってみる。

$kubectl get pods |grep sample

1-sample-pod 1/1 Running 0 17h
2-sample-pod 1/1 Running 0 17h

# 1-sample-pod でコマンド(/bin/sh)を実行する。シェルが起動され、touchコマンドでファイルを作る
$kubectl exec -it 1-sample-pod /bin/sh
#touch /tmp/test.txt
#exit

# 再度 exec コマンドを実行する。ls コマンドを実行する
$kubectl exec 1-sample-pod ls /tmp/
test.txt

一つの Pod で複数のコンテナを内包する場合、-c [container name]と指定することで指定したコンテナでコマンドを実行できる。


kubectl top で Pod のコンテナが使用しているリソース状況を確認する

kubeclt descirbeで確認できるのは Pod に確保したリソースとなる。

そのため、kubectl topコマンドを利用して実際に使っているリソース利用状況を確認する。

Interacting with running Pods


kubectl top pod POD_NAME --containers # Show metrics for a given pod and its containers


やってみたが、NG...

$kubectl top pod 1-sample-pod

Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

$kubectl top nodes
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)

これは EKS の制約かもしれないが、一旦保留。


kubectl port-forward でローカルマシンからアクセスする

Interacting with running Pods


kubectl port-forward my-pod 5000:6000 # Listen on port 5000 on the local machine and forward to port 6000 on my-pod


Pod で起動している nginx コンテナに port-forward してアクセス出来るようにする。

# ローカルホストの 8888 を Pod の 80番ポートに転送

$kubectl port-forward 1-sample-pod 8888:80
Forwarding from 127.0.0.1:8888 -> 80
Forwarding from [::1]:8888 -> 80
Handling connection for 8888

# 別途シェルを起動.curl でアクセスできた
$curl http://localhost:8888
<!DOCTYPE html>
<html>
<head>
<title>Welcome to nginx!</title>
<style>
body {
width: 35em;
margin: 0 auto;
font-family: Tahoma, Verdana, Arial, sans-serif;
}
</style>
</head>
<body>
<h1>Welcome to nginx!</h1>
<p>If you see this page, the nginx web server is successfully installed and
working. Further configuration is required.</p>

<p>For online documentation and support please refer to
<a href="http://nginx.org/">nginx.org</a>.<br/>
Commercial support is available at
<a href="http://nginx.com/">nginx.com</a>.</p>

<p><em>Thank you for using nginx.</em></p>
</body>
</html>

#kubectl logs でアクセスログを確認し、アクセスがある事が確認出来た
$ kubectl logs 1-sample-pod
127.0.0.1 - - [26/Dec/2018:02:45:58 +0000] "GET / HTTP/1.1" 200 612 "-" "curl/7.43.0" "-"


kubectl run でコンテナのシェル上で確認する

Interacting with running Pods


kubectl run -i --tty busybox --image=busybox -- sh # Run pod as interactive shell


例えば Pod 作成時に一時的にコンテナは起動するものの一定時間経過によってコンテナが終了してしまう場合、kubectl execが使えない。

kubectl runであれば docker runのようにコンテナを起動する事が出来る。

# kubectl run で Pod を起動

$ kubectl run -i --tty busybox --image=busybox -- sh
If you don't see a command prompt, try pressing enter.
/ #

# 別シェルを起動して確認.kubectl run コマンドによってコンテナが Pod として起動している事が分かる
$ kubectl get pods |grep busy
busybox-74db8b6768-x4sfv 1/1 Running 0 54s