More than 3 years have passed since last update.

Ubuntu 18.04 -- installing CUDA toolkit

ubuntu18.04

Posted at 2021-12-28

準備

$HOME/Downloads/

に、以下のファイルを置く。

NVIDIA-Linux-x86_64-470.94.run (これはなくてもよい。ドライバなので。)
cuda_11.4.3_470.82.01_linux.run
libcudnn8_8.2.4.15-1+cuda11.4_amd64.deb
libcudnn8-dev_8.2.4.15-1+cuda11.4_amd64.deb
libcudnn8-samples_8.2.4.15-1+cuda11.4_amd64.deb
TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz

手順１

Terminal を閉じる。

Ctrl+Alt+F3
これでコンソール画面に移動する

# console 画面

sudo service gdm stop

cd Downloads

sudo sh cuda_11.4.3_470.82.01_linux.run

EULA --- accept

Driver をアンコメント（理由は不明だが、Driverは別に入れないとうまくいかない）

Options を選択

＿＿　Samples Options を選択
______ Change Writable Samples Install Path
__________ /usr/share/cuda
______ Done
____ Install

cd $HOME
nano .profile

===
export PATH=/usr/local/cuda-11.4/bin:$PATH:$HOME/bin
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:/usr/local/cuda-11.4/extras/CUPTI/lib64:$LD_LIBRARY_PATH
===
と加筆。

再起動。
（sudo init 6 だと、このあとの手順がうまくいかなかった。普通に電源を落とすのが良いかもしれない。）

DeviceQuery

$ cd $HOME
$ cp -r /usr/share/cuda/NVIDIA_CUDA-11.4_Samples .
$ cd $HOME/NVIDIA_CUDA-11.4_Samples/1_Utilities
$ cd deviceQuery
$ make

make: Nothing to be done for 'all'.
-- もしかすると２回目なのでこの表示だけなのかもしれない。

$ ../../bin/x86_64/linux/release/deviceQuery 
../../bin/x86_64/linux/release/deviceQuery Starting...

 CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3050 Ti Laptop GPU"
  CUDA Driver Version / Runtime Version          11.4 / 11.4
  CUDA Capability Major/Minor version number:    8.6
  Total amount of global memory:                 3911 MBytes (4100784128 bytes)
  (020) Multiprocessors, (128) CUDA Cores/MP:    2560 CUDA Cores
  GPU Max Clock rate:                            1485 MHz (1.49 GHz)
  Memory Clock rate:                             6001 Mhz
  Memory Bus Width:                              128-bit
  L2 Cache Size:                                 2097152 bytes
  Maximum Texture Dimension Size (x,y,z)         1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
  Maximum Layered 1D Texture Size, (num) layers  1D=(32768), 2048 layers
  Maximum Layered 2D Texture Size, (num) layers  2D=(32768, 32768), 2048 layers
  Total amount of constant memory:               65536 bytes
  Total amount of shared memory per block:       49152 bytes
  Total shared memory per multiprocessor:        102400 bytes
  Total number of registers available per block: 65536
  Warp size:                                     32
  Maximum number of threads per multiprocessor:  1536
  Maximum number of threads per block:           1024
  Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
  Max dimension size of a grid size    (x,y,z): (2147483647, 65535, 65535)
  Maximum memory pitch:                          2147483647 bytes
  Texture alignment:                             512 bytes
  Concurrent copy and kernel execution:          Yes with 2 copy engine(s)
  Run time limit on kernels:                     Yes
  Integrated GPU sharing Host Memory:            No
  Support host page-locked memory mapping:       Yes
  Alignment requirement for Surfaces:            Yes
  Device has ECC support:                        Disabled
  Device supports Unified Addressing (UVA):      Yes
  Device supports Managed Memory:                Yes
  Device supports Compute Preemption:            Yes
  Supports Cooperative Kernel Launch:            Yes
  Supports MultiDevice Co-op Kernel Launch:      Yes
  Device PCI Domain ID / Bus ID / location ID:   0 / 1 / 0
  Compute Mode:
     < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS


$ cd ../bandwidthTest/

$ make
>>> GCC Version is greater or equal to 5.1.0 <<<
make: Nothing to be done for 'all'.

$ ../../bin/x86_64/linux/release/bandwidthTest -device=all
[CUDA Bandwidth Test] - Starting...

!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!

Running on...

 Device 0: NVIDIA GeForce RTX 3050 Ti Laptop GPU
 Quick Mode

 Host to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			12.7

 Device to Host Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			11.3

 Device to Device Bandwidth, 1 Device(s)
 PINNED Memory Transfers
   Transfer Size (Bytes)	Bandwidth(GB/s)
   32000000			178.0

Result = PASS

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

cuDNN のインストール

sudo apt install libfreeimage-dev
cd $HOME/Downloads

sudo dpkg -i libcudnn8_8.2.4.15-1+cuda11.4_amd64.deb
sudo dpkg -i libcudnn8-dev_8.2.4.15-1+cuda11.4_amd64.deb
sudo dpkg -i libcudnn8-samples_8.2.4.15-1+cuda11.4_amd64.deb

$ cd $HOME
$ cp -r /usr/src/cudnn_samples_v8/ $HOME
$ cd $HOME/cudnn_samples_v8/mnistCUDNN
$ make

CUDA_VERSION is 11040
Linking agains cublasLt = true
CUDA VERSION: 11040
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 35 50 53 60 61 62 70 72 75 80 86
make: Nothing to be done for 'all'.


$ ./mnistCUDNN

Executing: mnistCUDNN
cudnnGetVersion() : 8204 , CUDNN_VERSION from cudnn.h : 8204 (8.2.4)
Host compiler version : GCC 7.5.0

There are 1 CUDA capable devices on your machine :
device 0 : sms 20  Capabilities 8.6, SmClock 1485.0 Mhz, MemSize (Mb) 3910, MemClock 6001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0

Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.015200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.051200 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.681984 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 2.617088 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 4.258816 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.039872 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.104448 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.106496 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.416768 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.430944 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 1.399808 time requiring 128848 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.011168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.038912 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.042880 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.044032 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.034816 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.045056 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.075776 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.095232 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.098272 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.103424 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.046080 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.049952 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.054272 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.247808 time requiring 28800 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.039936 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.040096 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.046080 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.081728 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.095040 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.104448 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001 
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.023552 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.035840 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.036864 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.044032 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.038912 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.040960 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.044032 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.077824 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.094208 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.103424 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000 
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006 

Result of classification: 1 3 5

Test passed!

TensorRT のインストール

$ cd /usr/local/cuda-11.4/
$ sudo tar xzvf ~/Downloads/TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz


$ cd
$ nano .profile

===
# Kohei added the followiing (2021/12/28)
export PATH=/usr/local/cuda-11.4/bin:$PATH          
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/TensorRT-8.0.3.4/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=/usr/local/cuda-11.4/TensorRT-8.0.3.4/lib:/usr/local/cuda-11.4/lib64:/usr/local/cuda-11.4/extras/CUPTI/lib64:$LIBRARY_PATH
export CPATH=/usr/local/cuda-11.4/TensorRT-8.0.3.4/include:$CPATH

===

$ source .profile


k@kage:~$ cp -r /usr/local/cuda-11.4/TensorRT-8.0.3.4/samples $HOME/TensorRT-8.0.3.4-samples
k@kage:~$ ls -l
total 44
drwxr-xr-x  7 k k 4096 12月 28 16:41 cudnn_samples_v8
drwxr-xr-x  2 k k 4096 12月 20 01:32 Desktop
drwxr-xr-x  2 k k 4096 12月 20 01:32 Documents
drwxr-xr-x  4 k k 4096 12月 28 15:27 Downloads
drwxr-xr-x  2 k k 4096 12月 20 01:32 Music
drwxr-xr-x 12 k k 4096 12月 28 16:27 NVIDIA_CUDA-11.4_Samples
drwxr-xr-x  2 k k 4096 12月 22 13:55 Pictures
drwxr-xr-x  2 k k 4096 12月 20 01:32 Public
drwxr-xr-x  2 k k 4096 12月 20 01:32 Templates
drwxr-xr-x 24 k k 4096 12月 28 17:36 TensorRT-8.0.3.4-samples
drwxr-xr-x  2 k k 4096 12月 20 01:32 Videos
k@kage:~$ cd TensorRT-8.0.3.4-samples/sampleMNIST
k@kage:~/TensorRT-8.0.3.4-samples/sampleMNIST$ make
../Makefile.config:11: CUDA_INSTALL_DIR variable is not specified, using /usr/local/cuda by default, use CUDA_INSTALL_DIR=<cuda_directory> to change.
../Makefile.config:16: CUDNN_INSTALL_DIR variable is not specified, using /usr/local/cuda by default, use CUDNN_INSTALL_DIR=<cudnn_directory> to change.
../Makefile.config:29: TRT_LIB_DIR is not specified, searching ../../lib, ../../lib, ../lib by default, use TRT_LIB_DIR=<trt_lib_directory> to change.
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: sampleMNIST.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleInference.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/crc32.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/logger.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/getOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleReporting.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleEngines.cpp
Linking: ../../bin/sample_mnist_debug
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: sampleMNIST.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleInference.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/crc32.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/logger.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/getOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleReporting.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleEngines.cpp
Linking: ../../bin/sample_mnist
k@kage:~/TensorRT-8.0.3.4-samples/sampleMNIST$ 
k@kage:~/TensorRT-8.0.3.4-samples/sampleMNIST$ cd ../../bin/
k@kage:~/bin$ ls -l
total 6624
drwxrwxr-x 3 k k    4096 12月 28 17:36 chobj
drwxrwxr-x 3 k k    4096 12月 28 17:36 dchobj
-rwxrwxr-x 1 k k 1539824 12月 28 17:36 sample_mnist
-rwxrwxr-x 1 k k 5234112 12月 28 17:36 sample_mnist_debug
k@kage:~/bin$ cp -r /usr/local/cuda-11.4/TensorRT-8.0.3.4/data .
k@kage:~/bin$ ./sample_mnist --datadir $HOME/bin/data/mnist/
&&&& RUNNING TensorRT.sample_mnist [TensorRT v8003] # ./sample_mnist --datadir /home/k/bin/data/mnist/
[12/28/2021-17:37:38] [I] Building and running a GPU inference engine for MNIST
[12/28/2021-17:37:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 542, GPU 328 (MiB)
[12/28/2021-17:37:39] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 543 MiB, GPU 328 MiB
[12/28/2021-17:37:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +792, GPU +340, now: CPU 1335, GPU 668 (MiB)
[12/28/2021-17:37:40] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +196, GPU +342, now: CPU 1531, GPU 1010 (MiB)
[12/28/2021-17:37:40] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[12/28/2021-17:37:47] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[12/28/2021-17:37:48] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/28/2021-17:37:48] [I] [TRT] Total Host Persistent Memory: 6976
[12/28/2021-17:37:48] [I] [TRT] Total Device Persistent Memory: 1626624
[12/28/2021-17:37:48] [I] [TRT] Total Scratch Memory: 0
[12/28/2021-17:37:48] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 4 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2506, GPU 1494 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2506, GPU 1502 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2506, GPU 1486 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2506, GPU 1468 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 2506 MiB, GPU 1468 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2507, GPU 1464 (MiB)
[12/28/2021-17:37:48] [I] [TRT] Loaded engine size: 1 MB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 2507 MiB, GPU 1464 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2508, GPU 1474 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2508, GPU 1482 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2508, GPU 1466 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 2508 MiB, GPU 1466 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 2504 MiB, GPU 1466 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2504, GPU 1474 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2504, GPU 1482 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 2504 MiB, GPU 1484 MiB
[12/28/2021-17:37:48] [I] Input:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@%+-:  =@@@@@@@@@@@@
@@@@@@@%=      -@@@**@@@@@@@
@@@@@@@   :%#@-#@@@. #@@@@@@
@@@@@@*  +@@@@:*@@@  *@@@@@@
@@@@@@#  +@@@@ @@@%  @@@@@@@
@@@@@@@.  :%@@.@@@. *@@@@@@@
@@@@@@@@-   =@@@@. -@@@@@@@@
@@@@@@@@@%:   +@- :@@@@@@@@@
@@@@@@@@@@@%.  : -@@@@@@@@@@
@@@@@@@@@@@@@+   #@@@@@@@@@@
@@@@@@@@@@@@@@+  :@@@@@@@@@@
@@@@@@@@@@@@@@+   *@@@@@@@@@
@@@@@@@@@@@@@@: =  @@@@@@@@@
@@@@@@@@@@@@@@ :@  @@@@@@@@@
@@@@@@@@@@@@@@ -@  @@@@@@@@@
@@@@@@@@@@@@@# +@  @@@@@@@@@
@@@@@@@@@@@@@* ++  @@@@@@@@@
@@@@@@@@@@@@@*    *@@@@@@@@@
@@@@@@@@@@@@@#   =@@@@@@@@@@
@@@@@@@@@@@@@@. +@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@

[12/28/2021-17:37:48] [I] Output:
0: 
1: 
2: 
3: 
4: 
5: 
6: 
7: 
8: **********
9: 

[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2504, GPU 1466 (MiB)
&&&& PASSED TensorRT.sample_mnist [TensorRT v8003] # ./sample_mnist --datadir /home/k/bin/data/mnist/
k@kage:~/bin$

付録：ファイルの入手先

今回のファイルは、NVIDIAでユーザ登録をしないと入手できないものもあった
（サンプルなど）。

wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run
sudo sh cuda_11.4.3_470.82.01_linux.run

libcudnn8-dev_8.2.4.15-1+cuda11.4_amd64.deb

libcudnn8-samples_8.2.4.15-1+cuda11.4_amd64.deb

-> TensorRT 8.0.3 GA for Linux x86_64 and CUDA 11.3 TAR package

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up