準備
$HOME/Downloads/
に、以下のファイルを置く。
- NVIDIA-Linux-x86_64-470.94.run (これはなくてもよい。ドライバなので。)
- cuda_11.4.3_470.82.01_linux.run
- libcudnn8_8.2.4.15-1+cuda11.4_amd64.deb
- libcudnn8-dev_8.2.4.15-1+cuda11.4_amd64.deb
- libcudnn8-samples_8.2.4.15-1+cuda11.4_amd64.deb
- TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz
手順1
Terminal を閉じる。
Ctrl+Alt+F3
これでコンソール画面に移動する
# console 画面
sudo service gdm stop
cd Downloads
sudo sh cuda_11.4.3_470.82.01_linux.run
EULA --- accept
Driver をアンコメント(理由は不明だが、Driverは別に入れないとうまくいかない)
Options を選択
__ Samples Options を選択
______ Change Writable Samples Install Path
__________ /usr/share/cuda
______ Done
____ Install
cd $HOME
nano .profile
===
export PATH=/usr/local/cuda-11.4/bin:$PATH:$HOME/bin
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:/usr/local/cuda-11.4/extras/CUPTI/lib64:$LD_LIBRARY_PATH
===
と加筆。
再起動。
(sudo init 6 だと、このあとの手順がうまくいかなかった。普通に電源を落とすのが良いかもしれない。)
DeviceQuery
$ cd $HOME
$ cp -r /usr/share/cuda/NVIDIA_CUDA-11.4_Samples .
$ cd $HOME/NVIDIA_CUDA-11.4_Samples/1_Utilities
$ cd deviceQuery
$ make
make: Nothing to be done for 'all'.
-- もしかすると2回目なのでこの表示だけなのかもしれない。
$ ../../bin/x86_64/linux/release/deviceQuery
../../bin/x86_64/linux/release/deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 3050 Ti Laptop GPU"
CUDA Driver Version / Runtime Version 11.4 / 11.4
CUDA Capability Major/Minor version number: 8.6
Total amount of global memory: 3911 MBytes (4100784128 bytes)
(020) Multiprocessors, (128) CUDA Cores/MP: 2560 CUDA Cores
GPU Max Clock rate: 1485 MHz (1.49 GHz)
Memory Clock rate: 6001 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 2097152 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total shared memory per multiprocessor: 102400 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device supports Managed Memory: Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 11.4, CUDA Runtime Version = 11.4, NumDevs = 1
Result = PASS
$ cd ../bandwidthTest/
$ make
>>> GCC Version is greater or equal to 5.1.0 <<<
make: Nothing to be done for 'all'.
$ ../../bin/x86_64/linux/release/bandwidthTest -device=all
[CUDA Bandwidth Test] - Starting...
!!!!!Cumulative Bandwidth to be computed from all the devices !!!!!!
Running on...
Device 0: NVIDIA GeForce RTX 3050 Ti Laptop GPU
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 12.7
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 11.3
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(GB/s)
32000000 178.0
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
cuDNN のインストール
sudo apt install libfreeimage-dev
cd $HOME/Downloads
sudo dpkg -i libcudnn8_8.2.4.15-1+cuda11.4_amd64.deb
sudo dpkg -i libcudnn8-dev_8.2.4.15-1+cuda11.4_amd64.deb
sudo dpkg -i libcudnn8-samples_8.2.4.15-1+cuda11.4_amd64.deb
$ cd $HOME
$ cp -r /usr/src/cudnn_samples_v8/ $HOME
$ cd $HOME/cudnn_samples_v8/mnistCUDNN
$ make
CUDA_VERSION is 11040
Linking agains cublasLt = true
CUDA VERSION: 11040
TARGET ARCH: x86_64
HOST_ARCH: x86_64
TARGET OS: linux
SMS: 35 50 53 60 61 62 70 72 75 80 86
make: Nothing to be done for 'all'.
$ ./mnistCUDNN
Executing: mnistCUDNN
cudnnGetVersion() : 8204 , CUDNN_VERSION from cudnn.h : 8204 (8.2.4)
Host compiler version : GCC 7.5.0
There are 1 CUDA capable devices on your machine :
device 0 : sms 20 Capabilities 8.6, SmClock 1485.0 Mhz, MemSize (Mb) 3910, MemClock 6001.0 Mhz, Ecc=0, boardGroupID=0
Using device 0
Testing single precision
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.015200 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.051200 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.681984 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 2.617088 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 4.258816 time requiring 184784 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.039872 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.104448 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.106496 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.416768 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.430944 time requiring 1433120 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 1.399808 time requiring 128848 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.9999399 0.0000000 0.0000000 0.0000561 0.0000000 0.0000012 0.0000017 0.0000010 0.0000000
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.010240 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.011168 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.038912 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.042880 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.044032 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.034816 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.045056 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.075776 time requiring 128848 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.095232 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.098272 time requiring 128000 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.103424 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 0.9999288 0.0000000 0.0000711 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 0.9999820 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
Testing half precision (math in single precision)
Loading binary file data/conv1.bin
Loading binary file data/conv1.bias.bin
Loading binary file data/conv2.bin
Loading binary file data/conv2.bias.bin
Loading binary file data/ip1.bin
Loading binary file data/ip1.bias.bin
Loading binary file data/ip2.bin
Loading binary file data/ip2.bias.bin
Loading image data/one_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011264 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.046080 time requiring 2057744 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.049952 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.054272 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.247808 time requiring 28800 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.039936 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.040096 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.046080 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.081728 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.095040 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.104448 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000001 1.0000000 0.0000001 0.0000000 0.0000563 0.0000001 0.0000012 0.0000017 0.0000010 0.0000001
Loading image data/three_28x28.pgm
Performing forward propagation ...
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.011296 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.012288 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.023552 time requiring 28800 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.035840 time requiring 178432 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.036864 time requiring 184784 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.044032 time requiring 2057744 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnGetConvolutionForwardAlgorithm_v7 ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: -1.000000 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: -1.000000 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: -1.000000 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Testing cudnnFindConvolutionForwardAlgorithm ...
^^^^ CUDNN_STATUS_SUCCESS for Algo 0: 0.038912 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 1: 0.040960 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 4: 0.044032 time requiring 2450080 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 2: 0.077824 time requiring 0 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 5: 0.094208 time requiring 4656640 memory
^^^^ CUDNN_STATUS_SUCCESS for Algo 7: 0.103424 time requiring 1433120 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 6: -1.000000 time requiring 0 memory
^^^^ CUDNN_STATUS_NOT_SUPPORTED for Algo 3: -1.000000 time requiring 0 memory
Resulting weights from Softmax:
0.0000000 0.0000000 0.0000000 1.0000000 0.0000000 0.0000714 0.0000000 0.0000000 0.0000000 0.0000000
Loading image data/five_28x28.pgm
Performing forward propagation ...
Resulting weights from Softmax:
0.0000000 0.0000008 0.0000000 0.0000002 0.0000000 1.0000000 0.0000154 0.0000000 0.0000012 0.0000006
Result of classification: 1 3 5
Test passed!
TensorRT のインストール
$ cd /usr/local/cuda-11.4/
$ sudo tar xzvf ~/Downloads/TensorRT-8.0.3.4.Linux.x86_64-gnu.cuda-11.3.cudnn8.2.tar.gz
$ cd
$ nano .profile
===
# Kohei added the followiing (2021/12/28)
export PATH=/usr/local/cuda-11.4/bin:$PATH
export LD_LIBRARY_PATH=/usr/local/cuda-11.4/TensorRT-8.0.3.4/lib:$LD_LIBRARY_PATH
export LIBRARY_PATH=/usr/local/cuda-11.4/TensorRT-8.0.3.4/lib:/usr/local/cuda-11.4/lib64:/usr/local/cuda-11.4/extras/CUPTI/lib64:$LIBRARY_PATH
export CPATH=/usr/local/cuda-11.4/TensorRT-8.0.3.4/include:$CPATH
===
$ source .profile
k@kage:~$ cp -r /usr/local/cuda-11.4/TensorRT-8.0.3.4/samples $HOME/TensorRT-8.0.3.4-samples
k@kage:~$ ls -l
total 44
drwxr-xr-x 7 k k 4096 12月 28 16:41 cudnn_samples_v8
drwxr-xr-x 2 k k 4096 12月 20 01:32 Desktop
drwxr-xr-x 2 k k 4096 12月 20 01:32 Documents
drwxr-xr-x 4 k k 4096 12月 28 15:27 Downloads
drwxr-xr-x 2 k k 4096 12月 20 01:32 Music
drwxr-xr-x 12 k k 4096 12月 28 16:27 NVIDIA_CUDA-11.4_Samples
drwxr-xr-x 2 k k 4096 12月 22 13:55 Pictures
drwxr-xr-x 2 k k 4096 12月 20 01:32 Public
drwxr-xr-x 2 k k 4096 12月 20 01:32 Templates
drwxr-xr-x 24 k k 4096 12月 28 17:36 TensorRT-8.0.3.4-samples
drwxr-xr-x 2 k k 4096 12月 20 01:32 Videos
k@kage:~$ cd TensorRT-8.0.3.4-samples/sampleMNIST
k@kage:~/TensorRT-8.0.3.4-samples/sampleMNIST$ make
../Makefile.config:11: CUDA_INSTALL_DIR variable is not specified, using /usr/local/cuda by default, use CUDA_INSTALL_DIR=<cuda_directory> to change.
../Makefile.config:16: CUDNN_INSTALL_DIR variable is not specified, using /usr/local/cuda by default, use CUDNN_INSTALL_DIR=<cudnn_directory> to change.
../Makefile.config:29: TRT_LIB_DIR is not specified, searching ../../lib, ../../lib, ../lib by default, use TRT_LIB_DIR=<trt_lib_directory> to change.
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: sampleMNIST.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleInference.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/crc32.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/logger.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/getOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleReporting.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/dchobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleEngines.cpp
Linking: ../../bin/sample_mnist_debug
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: sampleMNIST.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleInference.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/crc32.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/logger.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/getOptions.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleReporting.cpp
if [ ! -d ../../bin/chobj/sampleMNIST/sampleMNIST/../common ]; then mkdir -p ../../bin/chobj/sampleMNIST/sampleMNIST/../common; fi && :
Compiling: ../common/sampleEngines.cpp
Linking: ../../bin/sample_mnist
k@kage:~/TensorRT-8.0.3.4-samples/sampleMNIST$
k@kage:~/TensorRT-8.0.3.4-samples/sampleMNIST$ cd ../../bin/
k@kage:~/bin$ ls -l
total 6624
drwxrwxr-x 3 k k 4096 12月 28 17:36 chobj
drwxrwxr-x 3 k k 4096 12月 28 17:36 dchobj
-rwxrwxr-x 1 k k 1539824 12月 28 17:36 sample_mnist
-rwxrwxr-x 1 k k 5234112 12月 28 17:36 sample_mnist_debug
k@kage:~/bin$ cp -r /usr/local/cuda-11.4/TensorRT-8.0.3.4/data .
k@kage:~/bin$ ./sample_mnist --datadir $HOME/bin/data/mnist/
&&&& RUNNING TensorRT.sample_mnist [TensorRT v8003] # ./sample_mnist --datadir /home/k/bin/data/mnist/
[12/28/2021-17:37:38] [I] Building and running a GPU inference engine for MNIST
[12/28/2021-17:37:39] [I] [TRT] [MemUsageChange] Init CUDA: CPU +536, GPU +0, now: CPU 542, GPU 328 (MiB)
[12/28/2021-17:37:39] [I] [TRT] [MemUsageSnapshot] Builder begin: CPU 543 MiB, GPU 328 MiB
[12/28/2021-17:37:40] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +792, GPU +340, now: CPU 1335, GPU 668 (MiB)
[12/28/2021-17:37:40] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +196, GPU +342, now: CPU 1531, GPU 1010 (MiB)
[12/28/2021-17:37:40] [W] [TRT] Detected invalid timing cache, setup a local cache instead
[12/28/2021-17:37:47] [I] [TRT] Some tactics do not have sufficient workspace memory to run. Increasing workspace size may increase performance, please check verbose output.
[12/28/2021-17:37:48] [I] [TRT] Detected 1 inputs and 1 output network tensors.
[12/28/2021-17:37:48] [I] [TRT] Total Host Persistent Memory: 6976
[12/28/2021-17:37:48] [I] [TRT] Total Device Persistent Memory: 1626624
[12/28/2021-17:37:48] [I] [TRT] Total Scratch Memory: 0
[12/28/2021-17:37:48] [I] [TRT] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 1 MiB, GPU 4 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2506, GPU 1494 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2506, GPU 1502 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2506, GPU 1486 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2506, GPU 1468 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] Builder end: CPU 2506 MiB, GPU 1468 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init CUDA: CPU +0, GPU +0, now: CPU 2507, GPU 1464 (MiB)
[12/28/2021-17:37:48] [I] [TRT] Loaded engine size: 1 MB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine begin: CPU 2507 MiB, GPU 1464 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2508, GPU 1474 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2508, GPU 1482 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2508, GPU 1466 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] deserializeCudaEngine end: CPU 2508 MiB, GPU 1466 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation begin: CPU 2504 MiB, GPU 1466 MiB
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 2504, GPU 1474 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuDNN: CPU +0, GPU +8, now: CPU 2504, GPU 1482 (MiB)
[12/28/2021-17:37:48] [I] [TRT] [MemUsageSnapshot] ExecutionContext creation end: CPU 2504 MiB, GPU 1484 MiB
[12/28/2021-17:37:48] [I] Input:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@%+-: =@@@@@@@@@@@@
@@@@@@@%= -@@@**@@@@@@@
@@@@@@@ :%#@-#@@@. #@@@@@@
@@@@@@* +@@@@:*@@@ *@@@@@@
@@@@@@# +@@@@ @@@% @@@@@@@
@@@@@@@. :%@@.@@@. *@@@@@@@
@@@@@@@@- =@@@@. -@@@@@@@@
@@@@@@@@@%: +@- :@@@@@@@@@
@@@@@@@@@@@%. : -@@@@@@@@@@
@@@@@@@@@@@@@+ #@@@@@@@@@@
@@@@@@@@@@@@@@+ :@@@@@@@@@@
@@@@@@@@@@@@@@+ *@@@@@@@@@
@@@@@@@@@@@@@@: = @@@@@@@@@
@@@@@@@@@@@@@@ :@ @@@@@@@@@
@@@@@@@@@@@@@@ -@ @@@@@@@@@
@@@@@@@@@@@@@# +@ @@@@@@@@@
@@@@@@@@@@@@@* ++ @@@@@@@@@
@@@@@@@@@@@@@* *@@@@@@@@@
@@@@@@@@@@@@@# =@@@@@@@@@@
@@@@@@@@@@@@@@. +@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@@@@@@@@@@@@@@@@@@@@@@@@@@@@
[12/28/2021-17:37:48] [I] Output:
0:
1:
2:
3:
4:
5:
6:
7:
8: **********
9:
[12/28/2021-17:37:48] [I] [TRT] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +0, now: CPU 2504, GPU 1466 (MiB)
&&&& PASSED TensorRT.sample_mnist [TensorRT v8003] # ./sample_mnist --datadir /home/k/bin/data/mnist/
k@kage:~/bin$
付録:ファイルの入手先
今回のファイルは、NVIDIAでユーザ登録をしないと入手できないものもあった
(サンプルなど)。
wget https://developer.download.nvidia.com/compute/cuda/11.4.3/local_installers/cuda_11.4.3_470.82.01_linux.run
sudo sh cuda_11.4.3_470.82.01_linux.run
libcudnn8-dev_8.2.4.15-1+cuda11.4_amd64.deb
libcudnn8-samples_8.2.4.15-1+cuda11.4_amd64.deb
-> TensorRT 8.0.3 GA for Linux x86_64 and CUDA 11.3 TAR package