はじめに
RTX3080を入手したのでドライバをインストールしたときのメモ。
画面表示されなくなった時のために、SSHでネットワーク経由で接続して操作できるようにしておいてよかった。ミスって画面表示されなくなることを前提にインストールを進めることを推奨します。
本題とは別に、apt-getやwgetがプロキシ設定に阻まれて苦戦したことはここでは省きます。多くの先人の記事を参考にさせていただきました。
Ubuntu バージョン
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 18.04.6 LTS
Release: 18.04
Codename: bionic
補助電源ケーブルをつけてGeforce RTX3080接続
補助電源ケーブルが間違っていた時、うまく認識できてなかった。
$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
$ ubuntu-drivers devices
WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0 ==
modalias : pci:v000010DEd0000104Asv00001462sd00002636bc03sc00i00
vendor : NVIDIA Corporation
model : GF119 [GeForce GT 610]
driver : nvidia-driver-390 - distro non-free recommended
driver : nvidia-340 - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
電源から8ピンのコネクタがなかったため、6ピンのまま適当に接続してしまっていたのが問題だった。(基礎中の基礎)
ちゃんと6ピン-8ピン変換を購入して、補助電源ケーブル差し込んだ。
ちなみにこちらを購入:Cable Matters 6ピン PCIe 8ピン PCIe 変換電源ケーブル ビデオグラフィックカードに対応 2本セット 10cm
$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 2206 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 2206 (rev a1)
02:00.1 Audio device: NVIDIA Corporation Device 1aef (rev a1)
を新たに認識している。
CUDA,nvidia-driverの確認
$ dpkg -l | grep nvidia
$ dpkg -l | grep cuda
もしインストールされていたら、削除が必要らしい。今回はなにもインストールされていなかったので削除不要だった。
必要なドライバの確認
$ ubuntu-drivers devices
WARNING:root:_pkg_get_support nvidia-driver-510: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-510-server: package has invalid Support PBheader, cannot determine support level
WARNING:root:_pkg_get_support nvidia-driver-390: package has invalid Support Legacyheader, cannot determine support level
== /sys/devices/pci0000:00/0000:00:03.0/0000:02:00.0 ==
modalias : pci:v000010DEd00002206sv00001462sd00003896bc03sc00i00
vendor : NVIDIA Corporation
driver : nvidia-driver-510 - distro non-free
driver : nvidia-driver-510-server - distro non-free
driver : nvidia-driver-470-server - distro non-free
driver : nvidia-driver-470 - distro non-free recommended
driver : xserver-xorg-video-nouveau - distro free builtin
== /sys/devices/pci0000:00/0000:00:02.0/0000:01:00.0 ==
modalias : pci:v000010DEd0000104Asv00001462sd00002636bc03sc00i00
vendor : NVIDIA Corporation
model : GF119 [GeForce GT 610]
driver : nvidia-driver-390 - distro non-free
driver : nvidia-340 - distro non-free
driver : xserver-xorg-video-nouveau - distro free builtin
driver : nvidia-driver-470 - distro non-free recommended
とあるので、nvidia-driver-470が推奨されていることが分かる。
※ あとで気づいたが、使いたいソフトウェア等のバージョンによって最新のnvidia-driverに対応していないことがあるので、用途に応じて確認すべき。
nouveau無効化
NVIDIAのグラフィックカードの場合,デフォルトでnouveauというドライバが使用されている.NVIDIAのドライバと競合する恐れがあるので無効化しておく./etc/modprobe.d/blacklist-nouveau.confを作成し,以下の設定を記述する。
$ lsmod | grep -i nouveau
$ sudo nano /etc/modprobe.d/blacklist-nouveau.conf
再読み込み,再起動を行う
$ sudo update-initramfs -u
$ sudo reboot
$ lsmod | grep -i nouveau
上記のコマンドで何も表示されないことを確認
nvidia-driverのインストール
$ sudo -E add-apt-repository ppa:graphics-drivers/ppa
Cannot add PPA: 'ppa:~graphics-drivers/ubuntu/ppa'.
ERROR: '~graphics-drivers' user or team does not exist.
対策いくつか講じたけど、よくわからなかったので無視して進めてみる()
$ sudo apt update
$ sudo apt install nvidia-driver-470
Secure Bootの設定が出てくる。ここでパスワードの設定が必要になる。
パッケージの設定
lqqqqqqqqqqqqqqqqqqqqqqqqu Configuring Secure Boot tqqqqqqqqqqqqqqqqqqqqqqqqk
x x
x Your system has UEFI Secure Boot enabled.
x
x UEFI Secure Boot requires additional configuration to work with
x third-party drivers.
x
x The system will assist you in configuring UEFI Secure Boot. To permit
x the use of third-party drivers, a new Machine-Owner Key (MOK) has been
x generated. This key now needs to be enrolled in your system's firmware.
x
x To ensure that this change is being made by you as an authorized user,
x and not by an attacker, you must choose a password now and then confirm
x the change after reboot using the same password, in both the "Enroll
x MOK" and "Change Secure Boot state" menus that will be presented to you
x
x <了解>
x x
mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
lqqqqqqqqqqqqqqqqqqqqqqqqu Configuring Secure Boot tqqqqqqqqqqqqqqqqqqqqqqqqk
x x
x x
x Enter a password for Secure Boot. It will be asked again after a reboot. x
x x
x _________________________________________________________________________ x
x x
x <了解> <取消> x
x x
mqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqqj
GPU側にモニタをつないで再起動、しかし画面表示されず
$ sudo reboot
$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
nvidia-smiもできない。画面表示もしなくなった。nvidia-driverはインストールされていそう。
$ dpkg -l | grep nvidia
ii libnvidia-cfg1-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA binary OpenGL/GLX configuration library
ii libnvidia-common-470 470.103.01-0ubuntu0.18.04.1 all Shared files used by the NVIDIA libraries
ii libnvidia-compute-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA libcompute package
ii libnvidia-compute-470:i386 470.103.01-0ubuntu0.18.04.1 i386 NVIDIA libcompute package
ii libnvidia-decode-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA Video Decoding runtime libraries
ii libnvidia-decode-470:i386 470.103.01-0ubuntu0.18.04.1 i386 NVIDIA Video Decoding runtime libraries
ii libnvidia-encode-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 NVENC Video Encoding runtime library
ii libnvidia-encode-470:i386 470.103.01-0ubuntu0.18.04.1 i386 NVENC Video Encoding runtime library
ii libnvidia-extra-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 Extra libraries for the NVIDIA driver
ii libnvidia-fbc1-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-fbc1-470:i386 470.103.01-0ubuntu0.18.04.1 i386 NVIDIA OpenGL-based Framebuffer Capture runtime library
ii libnvidia-gl-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-gl-470:i386 470.103.01-0ubuntu0.18.04.1 i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
ii libnvidia-ifr1-470:amd64 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
ii libnvidia-ifr1-470:i386 470.103.01-0ubuntu0.18.04.1 i386 NVIDIA OpenGL-based Inband Frame Readback runtime library
ii nvidia-compute-utils-470 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA compute utilities
ii nvidia-dkms-470 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA DKMS package
ii nvidia-driver-470 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA driver metapackage
ii nvidia-kernel-common-470 470.103.01-0ubuntu0.18.04.1 amd64 Shared files used with the kernel module
ii nvidia-kernel-source-470 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA kernel source package
ii nvidia-prime 0.8.16~0.18.04.1 all Tools to enable NVIDIA's Prime
ii nvidia-settings 470.57.01-0ubuntu0.18.04.1 amd64 Tool for configuring the NVIDIA graphics driver
ii nvidia-utils-470 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA driver support binaries
ii xserver-xorg-video-nvidia-470 470.103.01-0ubuntu0.18.04.1 amd64 NVIDIA binary Xorg driver
セキュアブートの無効化
セキュアブート周りを確認してみる。
$ dmesg | grep Secure
[ 0.000000] secureboot: Secure boot enabled
[ 0.000000] Kernel is locked down from EFI Secure Boot mode; see man kernel_lockdown.7
[ 0.010203] secureboot: Secure boot enabled
[ 1.318494] Loaded X.509 cert 'Canonical Ltd. Secure Boot Signing: 61482aa2830d0ab2ad5af10b7250da9033ddcef0'
セキュアブートが有効だったので、無効にして再起動。
参考:https://freesoft.tvbok.com/tips/lga2011/asus_secure_boot.html
nvidia-smiでドライバが動作していることを確認する。
$ nvidia-smi
Thu Apr 21 02:53:47 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.103.01 Driver Version: 470.103.01 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA GeForce ... Off | 00000000:02:00.0 Off | N/A |
| 0% 53C P8 9W / 320W | 22MiB / 10018MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1484 G /usr/lib/xorg/Xorg 9MiB |
| 0 N/A N/A 1516 G /usr/bin/gnome-shell 11MiB |
+-----------------------------------------------------------------------------+
画面が出ない問題
ここまでインストールできたのに、画面表示されない。
$ lspci |grep NVI | grep VGA
01:00.0 VGA compatible controller: NVIDIA Corporation GF119 [GeForce GT 610] (rev a1)
02:00.0 VGA compatible controller: NVIDIA Corporation Device 2206 (rev a1)
$ nvidia-smi -q | grep -A 1 "^GPU"
GPU 00000000:02:00.0
Product Name : NVIDIA GeForce RTX 3080
/etc/X11/xorg.conf上にもRTX3080が設定されていることを確認。
$ sudo nano /etc/X11/xorg.conf
Section "Device"
Identifier "Device0"
Driver "nvidia"
VendorName "NVIDIA Corporation"
BoardName "NVIDIA GeForce RTX 3080"
EndSection
Xサーバーを再起動することで画面表示された。
$ sudo startx
CUDAインストール
Tensorflowを動かしたい。CUDA11、CUDA10どちらでもTensorflowが動作した実績があるらしい。
nvidia-smiでCUDA Version: 11.4と表示されていたので、今回インストールしたnvidia-driver 470ではCuda 11.4まで対応している模様。今回は11.4をインストールする。下記サイトから選択して、実行していく。
sudo wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-ubuntu1804.pin
sudo mv cuda-ubuntu1804.pin /etc/apt/preferences.d/cuda-repository-pin-600
sudo wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda-repo-ubuntu1804-11-4-local_11.4.4-470.82.01-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu1804-11-4-local_11.4.4-470.82.01-1_amd64.deb
sudo apt-key add /var/cuda-repo-ubuntu1804-11-4-local/7fa2af80.pub
sudo apt-get update
sudo apt-get -y install cuda
Cudaインストール完了後の確認
無事インストール完了。
$ dpkg -l | grep cuda
ii cuda 11.4.4-1 amd64 CUDA meta-package
ii cuda-11-4 11.4.4-1 amd64 CUDA 11.4 meta-package
ii cuda-cccl-11-4 11.4.122-1 amd64 CUDA CCCL
ii cuda-command-line-tools-11-4 11.4.4-1 amd64 CUDA command-line tools
ii cuda-compiler-11-4 11.4.4-1 amd64 CUDA compiler
ii cuda-cudart-11-4 11.4.148-1 amd64 CUDA Runtime native Libraries
ii cuda-cudart-dev-11-4 11.4.148-1 amd64 CUDA Runtime native dev links, headers
ii cuda-cuobjdump-11-4 11.4.120-1 amd64 CUDA cuobjdump
ii cuda-cupti-11-4 11.4.120-1 amd64 CUDA profiling tools runtime libs.
ii cuda-cupti-dev-11-4 11.4.120-1 amd64 CUDA profiling tools interface.
ii cuda-cuxxfilt-11-4 11.4.120-1 amd64 CUDA cuxxfilt
ii cuda-demo-suite-11-4 11.4.100-1 amd64 Demo suite for CUDA
ii cuda-documentation-11-4 11.4.126-1 amd64 CUDA documentation
ii cuda-driver-dev-11-4 11.4.148-1 amd64 CUDA Driver native dev stub library
ii cuda-drivers 470.82.01-1 amd64 CUDA Driver meta-package, branch-agnostic
ii cuda-drivers-470 470.82.01-1 amd64 CUDA Driver meta-package, branch-specific
ii cuda-gdb-11-4 11.4.120-1 amd64 CUDA-GDB
ii cuda-libraries-11-4 11.4.4-1 amd64 CUDA Libraries 11.4 meta-package
ii cuda-libraries-dev-11-4 11.4.4-1 amd64 CUDA Libraries 11.4 development meta-package
ii cuda-memcheck-11-4 11.4.120-1 amd64 CUDA-MEMCHECK
ii cuda-nsight-11-4 11.4.120-1 amd64 CUDA nsight
ii cuda-nsight-compute-11-4 11.4.4-1 amd64 NVIDIA Nsight Compute
ii cuda-nsight-systems-11-4 11.4.4-1 amd64 NVIDIA Nsight Systems
ii cuda-nvcc-11-4 11.4.152-1 amd64 CUDA nvcc
ii cuda-nvdisasm-11-4 11.4.152-1 amd64 CUDA disassembler
ii cuda-nvml-dev-11-4 11.4.120-1 amd64 NVML native dev links, headers
ii cuda-nvprof-11-4 11.4.120-1 amd64 CUDA Profiler tools
ii cuda-nvprune-11-4 11.4.120-1 amd64 CUDA nvprune
ii cuda-nvrtc-11-4 11.4.152-1 amd64 NVRTC native runtime libraries
ii cuda-nvrtc-dev-11-4 11.4.152-1 amd64 NVRTC native dev links, headers
ii cuda-nvtx-11-4 11.4.120-1 amd64 NVIDIA Tools Extension
ii cuda-nvvp-11-4 11.4.193-1 amd64 CUDA Profiler tools
ii cuda-repo-ubuntu1804-11-4-local 11.4.4-470.82.01-1 amd64 cuda repository configuration files
ii cuda-runtime-11-4 11.4.4-1 amd64 CUDA Runtime 11.4 meta-package
ii cuda-samples-11-4 11.4.120-1 amd64 CUDA example applications
ii cuda-sanitizer-11-4 11.4.120-1 amd64 CUDA Sanitizer
ii cuda-toolkit-11-4 11.4.4-1 amd64 CUDA Toolkit 11.4 meta-package
ii cuda-toolkit-11-4-config-common 11.4.148-1 all Common config package for CUDA Toolkit 11.4.
ii cuda-toolkit-11-config-common 11.4.148-1 all Common config package for CUDA Toolkit 11.
ii cuda-toolkit-config-common 11.4.148-1 all Common config package for CUDA Toolkit.
ii cuda-tools-11-4 11.4.4-1 amd64 CUDA Tools meta-package
ii cuda-visual-tools-11-4 11.4.4-1 amd64 CUDA visual tools
/usr/local/cuda-11.0 ディレクトリが作成され、そこへ /usr/local/cuda からシンボリックリンクが貼られていることがわかる。
$ ls -l /usr/local | grep cuda
lrwxrwxrwx 1 root root 22 4月 21 22:36 cuda -> /etc/alternatives/cuda
lrwxrwxrwx 1 root root 25 4月 21 22:36 cuda-11 -> /etc/alternatives/cuda-11
drwxr-xr-x 16 root root 4096 4月 21 22:35 cuda-11.4
パスの設定
参考:Ubuntu 18.04LTSへのCUDA 11.0環境構築・確認手順
先ほど確認したCUDA Toolkitへパスを通すため、~/.bashrcに下記を追記します。これによりCUDA開発に利用するnvcc等のコマンドが実行出来るようになる。
export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
$ nvcc -V
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2021 NVIDIA Corporation
Built on Mon_Oct_11_21:27:02_PDT_2021
Cuda compilation tools, release 11.4, V11.4.152
Build cuda_11.4.r11.4/compiler.30521435_0
サンプルプログラムの実行
$ mkdir test
$ cp /usr/local/cuda/samples/1_Utilities/deviceQuery/deviceQuery.cpp test/
$ cp -r /usr/local/cuda/samples/common/inc/ test/
$ cd test
$ ls
deviceQuery.cpp inc
$ nvcc -I./inc deviceQuery.cpp -o deviceQuery
$ ./deviceQuery
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3080"
CUDA Driver Version / Runtime Version 11.4 / 11.4
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 10018 MBytes (10504699904 bytes)
(068) Multiprocessors, (128) CUDA Cores/MP: 8704 CUDA Cores
GPU Max Clock rate: 1740 MHz (1.74 GHz)
Memory Clock rate: 9501 Mhz
Memory Bus Width: 320-bit
L2 Cache Size: 5242880 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 102400 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 2 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS
正常に実行されるとデバイス情報が複数行にわたり出力され、最終行に「Result = PASS」の結果を得る。
(実行に問題がある場合には「Result = FAIL」となるらしい。)
最後に
これでインストール完了したつもり。今後使用していく中で問題が生じたら記事を更新するかも。