趣旨
Ubuntu 22.04 LTS (22.04.4) に nvidia driver 535 と cuda 12.1 を入れようとしたところ、下記のエラーでインストールできませんでした。その解決法を探ったので、そのときのメモです。
時間がたったら解決している可能性は大です。
ERROR: Cannot create report: [Errno 17] File exists: '/var/crash/nvidia-kernel-source-530.0.crash'
Error! Bad return status for module build on kernel: 5.15.0-113-generic (x86_64)
Consult /var/lib/dkms/nvidia/530.30.02/build/make.log for more information.
dpkg: error processing package nvidia-dkms-530 (--configure):
installed nvidia-dkms-530 package post-installation script subprocess returned error exit status 10
dpkg: dependency problems prevent configuration of cuda-drivers-530:
cuda-drivers-530 depends on nvidia-dkms-530 (>= 530.30.02); however:
Package nvidia-dkms-530 is not configured yet.
dpkg: error processing package cuda-drivers-530 (--configure):
dependency problems - leaving unconfigured
No apport report written because the error message indicates its a followup error from a previous failure.
No apport report written because the error message indicates its a followup error from a previous failure.
dpkg: dependency problems prevent configuration of nvidia-driver-530:
nvidia-driver-530 depends on nvidia-dkms-530 (= 530.30.02-0ubuntu1); however:
Package nvidia-dkms-530 is not configured yet.
dpkg: error processing package nvidia-driver-530 (--configure):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-drivers:
cuda-drivers depends on cuda-drivers-530 (= 530.30.02-1); however:
Package cuda-drivers-530 is not configured yet.
dpkg: error processing package cuda-drivers (--configure):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-runtime-12-1:
cuda-runtime-12-1 depends on cuda-drivers (>= 530.30.02); however:
Package cuda-drivers is not configured yet.
dpkg: error processing package cuda-runtime-12-1 (--configure):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-12-1:
cuda-12-1 depends on cuda-runtime-12-1 (>= 12.1.0); howNo apport report written because MaxReports is reached already
No apport report written because MaxReports is reached already
No apport report written because MaxReports is reached already
No apport report written because MaxReports is reached already
ever:
Package cuda-runtime-12-1 is not configured yet.
dpkg: error processing package cuda-12-1 (--configure):
dependency problems - leaving unconfigured
dpkg: dependency problems prevent configuration of cuda-demo-suite-12-1:
cuda-demo-suite-12-1 depends on cuda-runtime-12-1; however:
Package cuda-runtime-12-1 is not configured yet.
dpkg: error processing package cuda-demo-suite-12-1 (--configure):
dependency problems - leaving unconfigured
Processing triggers for initramfs-tools (0.140ubuntu13.4) ...
update-initramfs: Generating /boot/initrd.img-5.15.0-113-generic
Errors were encountered while processing:
nvidia-dkms-530
cuda-drivers-530
nvidia-driver-530
cuda-drivers
cuda-runtime-12-1
cuda-12-1
cuda-demo-suite-12-1
needrestart is being skipped since dpkg has failed
E: Sub-process /usr/bin/dpkg returned an error code (1)
GPU や CPU の違う複数のマシンで試しても同じことになったので、特定の構成に依存するというわけではなさそうです。
解決方法
- nvidia-driver-535 と CUDA 12.2 をインストールする
これで解決しました。pytorch の公式の対応は 12.1 なのですが、12.2 でも問題なく動いているように見えます(一応、メジャーバージョン内では後方互換性はあるはずので 12.x なら動くはず)
ドライバのバージョンを変えるのもためしましたが、何を入れてもダメでした。