1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 3 years have passed since last update.

NVIDIA-Device-PluginがPC再起動で動かなくなった場合(nvidia-container-cli: initialization error: nvml error: driver not loaded))

Posted at

reboot後「 nvidia-device-plugin-daemonset」で以下が発生

Error: failed to start container "nvidia-device-plugin-ctr": 
Error response from daemon: OCI runtime create failed: container_linux.go:349:
starting container process caused "process_linux.go:449: 
container init caused \"process_linux.go:432: 
running prestart hook 0 caused \\\"error running hook:
 exit status 1, stdout: , stderr: 
nvidia-container-cli: initialization error: nvml error: driver not loaded\\\\n\\\"\"": unknown

ndivia-smiも効かない

# nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.

Driverを再インストール

# bash NVIDIA-Linux-x86_64-470.57.02.run
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 470.57.02

nvidia-smi再チェック→OK

# nvidia-smi
Wed Sep 29 01:08:37 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.57.02    Driver Version: 470.57.02    CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0 Off |                  N/A |
| 35%   32C    P8    N/A /  75W |      0MiB /  4040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Device-Plugin再Deploy

# kubectl apply -f nvidia-device-plugin.yml
daemonset.apps/nvidia-device-plugin-daemonset unchanged

# kubectl get pods --all-namespaces
NAMESPACE       NAME                                        READY   STATUS        RESTARTS   AGE
kube-system     nvidia-device-plugin-daemonset-tmn88        1/1     Running       10         32m
1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?