LoginSignup
3
1

aptでインストールしたライブラリの依存関係が壊れてCUDAが動かなくなったときの対処法

Last updated at Posted at 2024-03-02

CUDAの再インストールが必要なときの手順:scream:

CUDAが認識されない(nvidia, nvccが使えない)

$ nvidia-smi
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.
$ nvcc -V
Command 'nvcc' not found, but can be installed with:
sudo apt install nvidia-cuda-toolkit

まずやること

  • PCの再起動
  • sudo apt update
  • sudo apt upgrade

以下おまけ

  • sudo apt --fix-broken install
  • sudo dpkg --audit
  • sudo dpkg --configure --pending
  • sudo dpkg --configure <ファイル名>
  • ls /var/lib/dpkg/info/<ファイル名>(存在の確認)
  • sudo dpkg-reconfigure

だめなとき CUDA再インストールが必要:ghost:

PCで使用しているGPUの型番とCompatible Driverのバージョンを確認する(NVIDIAのページを確認する)

2.1. Verify You Have a CUDA-Capable GPU

$ lspci | grep -i nvidia
01:00.0 VGA compatible controller: NVIDIA Corporation Device 2704 (rev a1)
01:00.1 Audio device: NVIDIA Corporation Device 22bb (rev a1)

↓切り抜き(私のPCはGPU NVIDIA GeForce RTX 3070)
image.png

2.2. Verify You Have a Supported Version of Linux

$ uname -m && cat /etc/*release
x86_64
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=22.04
DISTRIB_CODENAME=jammy
DISTRIB_DESCRIPTION="Ubuntu 22.04.3 LTS"
PRETTY_NAME="Ubuntu 22.04.3 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.3 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

2.3. Verify the System Has gcc Installed

$ gcc --version
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

2.4. Verify the System has the Correct Kernel Headers and Development Packages Installed

$ uname -r
6.5.0-21-generic

2.7. Download the NVIDIA CUDA Toolkit
The NVIDIA CUDA Toolkit is available at https://developer.nvidia.com/cuda-downloads.

★好きなものを選ぶ
image.png

壊れたCUDAとお別れ

今のCUDA関連のライブラリを全てuninstallする

  1. Removing CUDA Toolkit and Driver
    Ubuntu and Debian

apt-getよりaptの方が新しくておすすめ!(置き換えるとよい)

To remove CUDA Toolkit:

sudo apt --purge remove "*cuda*" "*cublas*" "*cufft*" "*cufile*" "*curand*" \
 "*cusolver*" "*cusparse*" "*gds-tools*" "*npp*" "*nvjpeg*" "nsight*" "*nvvm*"

↓今回の実行結果

Package 'cuda-drivers-fabricmanager-550' is not installed, so not removed
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 libnvjitlink-12-2 : Depends: cuda-toolkit-config-common but it is not going to be installed
                     Depends: cuda-toolkit-12-config-common but it is not going to be installed
                     Depends: cuda-toolkit-12-2-config-common but it is not going to be installed
 nvidia-dkms-535 : Depends: nvidia-kernel-common-535 (= 535.161.07-0ubuntu1) but 535.154.05-0ubuntu1 is to be installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

↓指示に従ってみた

$ sudo apt --fix-broken install
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Correcting dependencies... Done
The following packages were automatically installed and are no longer required:
  nvidia-firmware-535-535.113.01 nvidia-firmware-535-535.146.02 nvidia-firmware-535-535.161.07
Use 'sudo apt autoremove' to remove them.
The following additional packages will be installed:
  nvidia-kernel-common-535
The following packages will be upgraded:
  nvidia-kernel-common-535
1 upgraded, 0 newly installed, 0 to remove and 112 not upgraded.
2 not fully installed or removed.
Need to get 0 B/38.3 MB of archives.
After this operation, 45.1 kB of additional disk space will be used.
Do you want to continue? [Y/n] Y
(Reading database ... 316795 files and directories currently installed.)
Preparing to unpack .../nvidia-kernel-common-535_535.161.07-0ubuntu1_amd64.deb ...
Unpacking nvidia-kernel-common-535 (535.161.07-0ubuntu1) over (535.154.05-0ubuntu1) ...
dpkg: error processing archive /var/cache/apt/archives/nvidia-kernel-common-535_535.161.07-0ubuntu1_a
md64.deb (--unpack):
 trying to overwrite '/lib/firmware/nvidia/535.161.07/gsp_ga10x.bin', which is also in package nvidia
-firmware-535-535.161.07 535.161.07-0ubuntu0.22.04.1
dpkg-deb: error: paste subprocess was killed by signal (Broken pipe)
Errors were encountered while processing:
 /var/cache/apt/archives/nvidia-kernel-common-535_535.161.07-0ubuntu1_amd64.deb
E: Sub-process /usr/bin/dpkg returned an error code (1)

To remove NVIDIA Drivers:

sudo apt --purge remove "*nvidia*" "libxnvctrl*"

↓実行結果

kage 'nvidia-docker2' is not installed, so not removed
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 cuda-drivers-535 : Depends: libnvidia-common-535 (>= 535.104.12) but it is not going to be installed
                    Depends: libnvidia-compute-535 (>= 535.104.12) but it is not going to be installed
                    Depends: libnvidia-decode-535 (>= 535.104.12) but it is not going to be installed
                    Depends: libnvidia-encode-535 (>= 535.104.12) but it is not going to be installed
                    Depends: libnvidia-fbc1-535 (>= 535.104.12) but it is not going to be installed
                    Depends: libnvidia-gl-535 (>= 535.104.12) but it is not going to be installed
                    Depends: nvidia-compute-utils-535 (>= 535.104.12) but it is not going to be installed
                    Depends: nvidia-dkms-535 (>= 535.104.12)
                    Depends: nvidia-driver-535 (>= 535.104.12) but it is not going to be installed
                    Depends: nvidia-kernel-common-535 (>= 535.104.12) but it is not going to be installed
                    Depends: nvidia-kernel-source-535 (>= 535.104.12) but it is not going to be installed or
                             nvidia-kernel-open-535 (>= 535.104.12) but it is not going to be installed
                    Depends: nvidia-utils-535 (>= 535.104.12) but it is not going to be installed
                    Depends: xserver-xorg-video-nvidia-535 (>= 535.104.12) but it is not going to be installed
                    Depends: nvidia-modprobe (>= 535.104.12) but it is not going to be installed
                    Depends: nvidia-settings (>= 535.104.12) but it is not going to be installed
 libnvidia-gl-535:i386 : Depends: libnvidia-common-535:i386 (= 535.161.07-0ubuntu1)
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

To clean up the uninstall:

sudo apt autoremove

↓実行結果

$ sudo apt autoremove
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
You might want to run 'apt --fix-broken install' to correct these.
The following packages have unmet dependencies:
 nvidia-dkms-535 : Depends: nvidia-kernel-common-535 (= 535.161.07-0ubuntu1) but 535.154.05-0ubuntu1 is installed
E: Unmet dependencies. Try 'apt --fix-broken install' with no packages (or specify a solution).

だめなので仕方なく一個強制的にずつ消すことにしました(T_T)
本当に疲れた:dizzy:

さらにダメな時

やったことを書いておきます

今回の原因(sudo dpkg --auditより)

nvidia-dkms-535      NVIDIA DKMS package
nvidia-driver-535    NVIDIA driver metapackage

警告
依存関係を壊すので自己責任で

↓消したいファイル名(今回はnvidia-dkms-535なので適宜置き換える)を全部から探してきて全部消すコマンド

sudo dpkg -L nvidia-dkms-535 | sudo grep '*nvidia-dkms-535*' | sudo xargs rm -rf ;\
sudo find /etc -name "*nvidia-dkms-535*" | while read f; do sudo rm -rf "$f"; done ;\
sudo find /var -name "*nvidia-dkms-535*" | while read f; do sudo rm -rf "$f"; done ;\
sudo find /usr -name "*nvidia-dkms-535*" | while read f; do sudo rm -rf "$f"; done

↓無理やりファイルを設定し直す

sudo dpkg-reconfigure --force nvidia-dkms-535

↓きれいになるまで繰り返しエラー文に従って無理消し去る

sudo dpkg --purge --force-all nvidia-dkms-535

今回は10回ほどこのコマンドを使用しました

お別れのあとの作業

★に戻ってCUDA Toolkitをインストール
先程のページの下に書いてあります
Base Installerをやります(aptが良いでしょう)
image.png

ここからが本番!Driverのインストール

NVIDAのドキュメントに従ってCUDAをインストールする
※コマンドと実行結果がのっています

3.10. Ubuntu
3.10.1. Prepare Ubuntu

  1. Perform the pre-installation actions.
    えらいから終わってます

  2. The kernel headers and development packages for the currently running kernel can be installed with:

$ sudo apt install linux-headers-$(uname -r)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
linux-headers-6.5.0-21-generic is already the newest version (6.5.0-21.21~22.04.1).
linux-headers-6.5.0-21-generic set to manually installed.
0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
  1. Remove Outdated Signing Key:
$ sudo apt-key del 7fa2af80
Warning: apt-key is deprecated. Manage keyring files in trusted.gpg.d instead (see apt-key(8)).
OK
  1. Choose an installation method: local repo or network repo.
    今回はnetwork repoを選択

3.10.3. Network Repo Installation for Ubuntu

  1. Install the new cuda-keyring package:
    image.png
    赤い文字のところを置き換えます
$ wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
--2024-02-29 22:34:59--  https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/cuda-keyring_1.1-1_all.deb
Resolving www-proxy.waseda.jp (www-proxy.waseda.jp)... 133.9.4.20
Connecting to www-proxy.waseda.jp (www-proxy.waseda.jp)|133.9.4.20|:8080... connected.
Proxy request sent, awaiting response... 200 OK
Length: 4332 (4.2K) [application/x-deb]
Saving to: ‘cuda-keyring_1.1-1_all.deb.2’

cuda-keyring_1.1-1_all.de 100%[====================================>]   4.23K  --.-KB/s    in 0s      

2024-02-29 22:34:59 (12.5 MB/s) - ‘cuda-keyring_1.1-1_all.deb.2’ saved [4332/4332]
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
(Reading database ... 257830 files and directories currently installed.)
Preparing to unpack cuda-keyring_1.1-1_all.deb ...
Unpacking cuda-keyring (1.1-1) over (1.1-1) ...
Setting up cuda-keyring (1.1-1) ...
$ sudo dpkg -i cuda-keyring_1.1-1_all.deb
(Reading database ... 257830 files and directories currently installed.)
Preparing to unpack cuda-keyring_1.1-1_all.deb ...
Unpacking cuda-keyring (1.1-1) over (1.1-1) ...
Setting up cuda-keyring (1.1-1) ...

少し下にいって
3.10.4. Common Installation Instructions for Ubuntu
These instructions apply to both local and network installation for Ubuntu.

  1. Update the Apt repository cache:
$ sudo apt update
Hit:1 https://download.docker.com/linux/ubuntu jammy InRelease
Hit:2 https://nvidia.github.io/libnvidia-container/stable/deb/amd64  InRelease    
Hit:3 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  InRelease
Hit:4 https://dl.google.com/linux/chrome/deb stable InRelease
Hit:5 http://security.ubuntu.com/ubuntu jammy-security InRelease                              
Hit:6 https://linux.teamviewer.com/deb stable InRelease                                       
Hit:7 https://ppa.launchpadcontent.net/graphics-drivers/ppa/ubuntu jammy InRelease                
Hit:8 http://jp.archive.ubuntu.com/ubuntu jammy InRelease     
Hit:9 http://jp.archive.ubuntu.com/ubuntu jammy-updates InRelease
Hit:10 http://jp.archive.ubuntu.com/ubuntu jammy-backports InRelease
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
All packages are up to date.
  1. Install CUDA SDK:
$ sudo apt install cuda-toolkit
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  cuda-toolkit
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 2,722 B of archives.
After this operation, 9,216 B of additional disk space will be used.
Get:1 https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64  cuda-toolkit 12.3.2-1 [2,722 B]
Fetched 2,722 B in 0s (30.6 kB/s)        
Selecting previously unselected package cuda-toolkit.
(Reading database ... 257830 files and directories currently installed.)
Preparing to unpack .../cuda-toolkit_12.3.2-1_amd64.deb ...
Unpacking cuda-toolkit (12.3.2-1) ...
Setting up cuda-toolkit (12.3.2-1) ...
Setting alternatives

To include all GDS packages:

sudo apt install nvidia-gds

CUDAの動作を確認する

このあと念の為

  • sudo apt update
  • sudo apt upgrade
    を実行した

エラーが出ないことを確認したら再起動しよう

復活か死か:eye:

nvidia-sminvcc -Vなどを実行してPCの復活を確認する!!!!

幸運を祈ります:hugging:

3
1
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
1