0
0

UbuntuにNvidiaドライバーがインストールできなくなった

Posted at

サーバを再起動したら以下のエラーが出てしまった

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the lates

以下のサイトを見ながら直していく

Ubuntuの更新ついでにNVIDIAのドライバを更新しようとしたらハマった話

ubuntuのバージョン確認

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 22.04.3 LTS
Release:        22.04
Codename:       jammy

カーネルモジュールの状態

$ dkms status
nvidia/510.108.03, 5.15.0-91-generic, x86_64: installed
$ uname -r   カーネルのバージョン
5.15.0-113-generic

使用しているGPUの確認

$ lspci | grep -i  nvidia
af:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1)
af:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
d8:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1)
d8:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
$ dpkg -l | grep nvidia
ii  libnvidia-cfg1-510:amd64               510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA binary OpenGL/GLX configuration library
ii  libnvidia-common-510                   525.147.05-0ubuntu2.22.04.1             all          Transitional package for libnvidia-common-535
ii  libnvidia-common-535                   535.183.01-0ubuntu0.22.04.1             all          Shared files used by the NVIDIA libraries
ii  libnvidia-compute-495:amd64            510.108.03-0ubuntu0.22.04.1             amd64        Transitional package for libnvidia-compute-510
ii  libnvidia-compute-510:amd64            510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA libcompute package
ii  libnvidia-container-tools              1.10.0-1                                amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64             1.10.0-1                                amd64        NVIDIA container runtime library
ii  libnvidia-decode-510:amd64             510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA Video Decoding runtime libraries
ii  libnvidia-egl-wayland1:amd64           1:1.1.9-1.1                             amd64        Wayland EGL External Platform library -- shared library
ii  libnvidia-encode-510:amd64             510.108.03-0ubuntu0.22.04.1             amd64        NVENC Video Encoding runtime library
ii  libnvidia-extra-510:amd64              510.108.03-0ubuntu0.22.04.1             amd64        Extra libraries for the NVIDIA driver
ii  libnvidia-fbc1-510:amd64               510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA OpenGL-based Framebuffer Capture runtime library
ii  libnvidia-gl-510:amd64                 510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii  libnvidia-ml-dev:amd64                 11.5.50~11.5.1-1ubuntu1                 amd64        NVIDIA Management Library (NVML) development files
ii  nvidia-compute-utils-510               510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA compute utilities
ii  nvidia-container-toolkit               1.10.0-1                                amd64        NVIDIA container runtime hook
ii  nvidia-cuda-dev:amd64                  11.5.1-1ubuntu1                         amd64        NVIDIA CUDA development files
ii  nvidia-cuda-gdb                        11.5.114~11.5.1-1ubuntu1                amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                    11.5.1-1ubuntu1                         amd64        NVIDIA CUDA development toolkit
ii  nvidia-cuda-toolkit-doc                11.5.1-1ubuntu1                         all          NVIDIA CUDA and OpenCL documentation
ii  nvidia-dkms-510                        510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA DKMS package
ii  nvidia-driver-510                      510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA driver metapackage
ii  nvidia-kernel-common-510               510.108.03-0ubuntu0.22.04.1             amd64        Shared files used with the kernel module
ii  nvidia-kernel-source-510               510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA kernel source package
ii  nvidia-modprobe                        510.47.03-0ubuntu1                      amd64        Load the NVIDIA kernel driver and create device files
ii  nvidia-opencl-dev:amd64                11.5.1-1ubuntu1                         amd64        NVIDIA OpenCL development files
ii  nvidia-prime                           0.8.17.1                                all          Tools to enable NVIDIA's Prime
ii  nvidia-profiler                        11.5.114~11.5.1-1ubuntu1                amd64        NVIDIA Profiler for CUDA and OpenCL
ii  nvidia-settings                        510.47.03-0ubuntu1                      amd64        Tool for configuring the NVIDIA graphics driver
ii  nvidia-utils-510                       510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA driver support binaries
ii  nvidia-visual-profiler                 11.5.114~11.5.1-1ubuntu1                amd64        NVIDIA Visual Profiler for CUDA and OpenCL
ii  screen-resolution-extra                0.18.2                                  all          Extension for the nvidia-settings control panel
ii  xserver-xorg-video-nvidia-510          510.108.03-0ubuntu0.22.04.1             amd64        NVIDIA binary Xorg driver
$ dpkg -l | grep cuda
ii  cuda                                   11.6.1-1                                amd64        CUDA meta-package
ii  cuda-11-6                              11.6.1-1                                amd64        CUDA 11.6 meta-package
ii  cuda-cccl-11-6                         11.6.55-1                               amd64        CUDA CCCL
ii  cuda-command-line-tools-11-6           11.6.1-1                                amd64        CUDA command-line tools
ii  cuda-compiler-11-6                     11.6.1-1                                amd64        CUDA compiler
ii  cuda-cudart-11-6                       11.6.55-1                               amd64        CUDA Runtime native Libraries
ii  cuda-cudart-dev-11-6                   11.6.55-1                               amd64        CUDA Runtime native dev links, headers
ii  cuda-cuobjdump-11-6                    11.6.112-1                              amd64        CUDA cuobjdump
ii  cuda-cupti-11-6                        11.6.112-1                              amd64        CUDA profiling tools runtime libs.
ii  cuda-cupti-dev-11-6                    11.6.112-1                              amd64        CUDA profiling tools interface.
ii  cuda-cuxxfilt-11-6                     11.6.112-1                              amd64        CUDA cuxxfilt
ii  cuda-demo-suite-11-6                   11.6.55-1                               amd64        Demo suite for CUDA
ii  cuda-documentation-11-6                11.6.112-1                              amd64        CUDA documentation
ii  cuda-driver-dev-11-6                   11.6.55-1                               amd64        CUDA Driver native dev stub library
ii  cuda-drivers                           510.47.03-1                             amd64        CUDA Driver meta-package, branch-agnostic
ii  cuda-drivers-510                       510.47.03-1                             amd64        CUDA Driver meta-package, branch-specific
ii  cuda-gdb-11-6                          11.6.112-1                              amd64        CUDA-GDB
ii  cuda-libraries-11-6                    11.6.1-1                                amd64        CUDA Libraries 11.6 meta-package
ii  cuda-libraries-dev-11-6                11.6.1-1                                amd64        CUDA Libraries 11.6 development meta-package
ii  cuda-memcheck-11-6                     11.6.112-1                              amd64        CUDA-MEMCHECK
ii  cuda-nsight-11-6                       11.6.112-1                              amd64        CUDA nsight
ii  cuda-nsight-compute-11-6               11.6.1-1                                amd64        NVIDIA Nsight Compute
ii  cuda-nsight-systems-11-6               11.6.1-1                                amd64        NVIDIA Nsight Systems
ii  cuda-nvcc-11-6                         11.6.112-1                              amd64        CUDA nvcc
ii  cuda-nvdisasm-11-6                     11.6.104-1                              amd64        CUDA disassembler
ii  cuda-nvml-dev-11-6                     11.6.55-1                               amd64        NVML native dev links, headers
ii  cuda-nvprof-11-6                       11.6.112-1                              amd64        CUDA Profiler tools
ii  cuda-nvprune-11-6                      11.6.112-1                              amd64        CUDA nvprune
ii  cuda-nvrtc-11-6                        11.6.112-1                              amd64        NVRTC native runtime libraries
ii  cuda-nvrtc-dev-11-6                    11.6.112-1                              amd64        NVRTC native dev links, headers
ii  cuda-nvtx-11-6                         11.6.112-1                              amd64        NVIDIA Tools Extension
ii  cuda-nvvp-11-6                         11.6.112-1                              amd64        CUDA Profiler tools
ii  cuda-repo-ubuntu1804-11-6-local        11.6.1-510.47.03-1                      amd64        cuda repository configuration files
ii  cuda-runtime-11-6                      11.6.1-1                                amd64        CUDA Runtime 11.6 meta-package
ii  cuda-samples-11-6                      11.6.101-1                              amd64        CUDA example applications
ii  cuda-sanitizer-11-6                    11.6.112-1                              amd64        CUDA Sanitizer
ii  cuda-toolkit-11-6                      11.6.1-1                                amd64        CUDA Toolkit 11.6 meta-package
ii  cuda-toolkit-11-6-config-common        11.6.55-1                               all          Common config package for CUDA Toolkit 11.6.
ii  cuda-toolkit-11-config-common          11.6.55-1                               all          Common config package for CUDA Toolkit 11.
ii  cuda-toolkit-config-common             11.6.55-1                               all          Common config package for CUDA Toolkit.
ii  cuda-tools-11-6                        11.6.1-1                                amd64        CUDA Tools meta-package
ii  cuda-visual-tools-11-6                 11.6.1-1                                amd64        CUDA visual tools
ii  libcudart11.0:amd64                    11.5.117~11.5.1-1ubuntu1                amd64        NVIDIA CUDA Runtime Library
ii  nvidia-cuda-dev:amd64                  11.5.1-1ubuntu1                         amd64        NVIDIA CUDA development files
ii  nvidia-cuda-gdb                        11.5.114~11.5.1-1ubuntu1                amd64        NVIDIA CUDA Debugger (GDB)
ii  nvidia-cuda-toolkit                    11.5.1-1ubuntu1                         amd64        NVIDIA CUDA development toolkit
ii  nvidia-cuda-toolkit-doc                11.5.1-1ubuntu1                         all          NVIDIA CUDA and OpenCL documentation

driverの削除

sudo apt-get --purge remove nvidia-*
sudo apt-get --purge remove cuda-*

推奨ドライドライバの確認

$ ubuntu-drivers devices
ERROR:root:aplay command not found
== /sys/devices/pci0000:ae/0000:ae:00.0/0000:af:00.0 ==
modalias : pci:v000010DEd00002230sv000010DEsd00001459bc03sc00i00
vendor   : NVIDIA Corporation
model    : GA102GL [RTX A6000]
driver   : nvidia-driver-535 - distro non-free recommended
driver   : nvidia-driver-470-server - distro non-free
driver   : nvidia-driver-535-server-open - distro non-free
driver   : nvidia-driver-470 - distro non-free
driver   : nvidia-driver-535-server - distro non-free
driver   : nvidia-driver-545 - distro non-free
driver   : nvidia-driver-535-open - distro non-free
driver   : nvidia-driver-545-open - distro non-free
driver   : xserver-xorg-video-nouveau - distro free builtin

以下のコマンドでinstallしてみる

sudo apt install nvidia-driver-535

このサイトからダウンロードしていく
NVIDIA CUDA Toolkit 12.1 Downloads

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-ubuntu2204.pin
sudo mv cuda-ubuntu2204.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.5.1/local_installers/cuda-repo-ubuntu2204-12-5-local_12.5.1-555.42.06-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2204-12-5-local_12.5.1-555.42.06-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2204-12-5-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-5

今の最新は12.5なんだね、進んだなぁ

$ nvidia-smi
Wed Jul 10 15:13:05 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01             Driver Version: 535.183.01   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA RTX A6000               Off | 00000000:AF:00.0 Off |                  Off |
| 30%   56C    P8              27W / 300W |      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA RTX A6000               Off | 00000000:D8:00.0 Off |                  Off |
| 30%   51C    P8               8W / 300W |      1MiB / 49140MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Thu_Nov_18_09:45:30_PST_2021
Cuda compilation tools, release 11.5, V11.5.119
Build cuda_11.5.r11.5/compiler.30672275_0

ちゃんと出力されました

なんでたまに再起動するとドライバ使えなくなっちゃうんですかねぇ
何台のupdateをやったことか

これは定期的にやっていかないとですね

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0