Dependencies only installed in conda environment - no system wide CUDA-Toolkit installation
Preface
I wanted to train a YOLOv4 model for fast object detection on an edge device and looked for the official YOLOv4 repository which is a fork from the YOLOv3 Darknet repository.
Darknet is a neural network framework written in C and CUDA. The setup for GPU is not explicitly guided but linked to the official OpenCV and CUDA-Toolkit websites which mainly provide setup guides for system-wide installation of OpenCV and CUDA-Toolkit. Because I use several neural network frameworks (e.g. Pytorch, Tensorflow 1 and 2), I seperate the setup on my PC in environments using Anaconda to avoid system wide installations which can cause version conflicts. The uninstallation of system wide installed packages are not unified - especially if not installed with package manager - and you need to invest time for correct cleanup afterwards. I could not find any guide about setting up Darknet for GPU training with conda environment, I decided to write this guide in order to help people, who might be in the same situation.
About conda
As described on the official webpage, conda is a package, dependency and environment management tool for many programming languages. In case of python - in contrast to pip - conda allows you to install not only python libraries and their dependencies into an environment, but also dependent packages written in other languages like the CUDA-Toolkit or the cuDNN library. Because the packages are installed into an environment, that gives you the benefit to have several versions of a package separated by environments and you can switch between them. Removing the packages in conda is clean and fast without the need of uninstallation scripts. Just removing the specific environment with the following command is enough.
conda env remove -n <environment_name>
Setup
I tested the setup on Ubuntu 20.04 LTS with Miniconda installed. The setup is actually quite easy and takes only 10 minutes - depending on your internet connection - and requires a minimum of 2GB for the CUDA-Toolkit and cuDNN library. The requirements for building the Darknet neural network framework contains OpenCV which we also install using conda.
Create conda environment
We create a conda environment YOLOv4_conda
and install python 3.7 as well as OpenCV 3.x with the following command in the terminal.
conda create -n YOLOv4_conda python=3.7 opencv=3
We install the CUDA-Toolkit and the cuDNN library from the nvidia
channel with the following terminal command as explained in the CUDA-Toolkit documentation.
conda install cuda cudnn -c nvidia -n YOLOv4_conda
The cuda
package installs the CUDA-Toolkit and the NVCC (NVIDIA CUDA Compiler) to build the Darknet CUDA code to run on GPU.
There is a
cuda-toolkit
package available as well in thenvidia
channel and generally used for GPU training on python based machine learning frameworks like Tensorflow or Pytorch, but this package only contains the cuda-toolkit and do NOT include the NVCC.
Finally we switch into our newly created environment.
conda activate YOLOv4_conda
Clone the Darknet YOLOv4 project
Clone and switch into the repository in the terminal.
git clone https://github.com/AlexeyAB/darknet.git
cd darknet
Check the version we cloned in the terminal for future environment setups.
git --no-pager log --oneline -1
Returns the following commit.
8a0bf84 (HEAD -> master, origin/master, origin/HEAD) various fixes (#8398)
Configure Darknet
Load the Makefile
into a text editor and modify the following three parts.
- Line 1-4: Enable GPU build, OpenCV and cuDNN
GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1
- Line 27-58: Uncomment the compute capability of your GPU (e.g to train with the RTX 3090 GPU, uncomment the following)
`#` GeForce RTX 3070, 3080, 3090
ARCH= -gencode arch=compute_86,code=[sm_86,compute_86]
- Line 80: Add the path to header files installed in the conda environment (path is stored in the CONDA_PREFIX variable)
COMMON= -Iinclude/ -I3rdparty/stb/include -I$(CONDA_PREFIX)/include
Build Darknet
In order for pkgconfig
to find the packages required to build Darknet in our environment, we temporary add (only in current terminal session) the environment to the searchpath in the terminal (again using the CONDA_PREFIX variable).
export PKG_CONFIG_PATH=$CONDA_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH
Run the build command.
make
If an error occurs during build, the build process stops with the last line similar to the following.
make: *** [Makefile:176: darknet] Error 1
Please see the Troubleshoot section for possible solutions to fix build errors.
Prepare Dataset
Regarding the Darknet documentation, the Dataset should be created in the build/darknet/x64/data
folder, but can actually be placed anywhere inside the darknet
repository folder. There are some inconsistencies in the documentation explaning the location of the ./darknet
executable and the folder to store dataset. In my case the ./darknet
executable is located in the repository root folder and regarding to that, I create my dataset in the folder data
in the repository root folder. For testing purposes I use a tiny custom dataset street_views_yolo
, with images collected from the COCO dataset.
Annotate
Create a file classes.txt
inside the dataset folder containing all class names in the dataset. In our test dataset, we use 3 classes which results in a classes.txt
file with the following content.
car
bus
person
The image annotations are stored in a file with the same filename as the image filename but with .txt
ending. Save the annotations in the same folder the images are stored. The annotation format has the following structure with 1 bounding box per line,
<object-class> <x_center> <y_center> <width> <height>
where <object-class>
is the class number in the classes.txt
file starting from 0 (car
). <x_center>
and <y_center>
contain the bounding box center point normalized between 0 and 1 (divided by the image width or image height respectively). <width>
and <height>
are the bounding box width and height also normalized between 0 and 1.
As an example, the annotations for an image street1.jpg
with 2 bounding boxes (car
and bus
) are stored in the file street1.txt
with the following content
0 0.5072916666666667 0.54453125 0.43125 0.3484375
1 0.5135416666666667 0.2875 0.40625 0.26875
The annotation can be done by using annotation tools like Labelimg providing already the Darknet annotation format. The installation is easy with conda
using the following command.
conda install labelimg -c conda-forge
Split Dataset
Training and validation images are stored in the folders train
and val
.
A listing file train.txt
for listing all training images and a val.txt
for all validation images is required. The listing (e.g. for JPEG images) can be created with the following commands.
ls train/*.jpg > train.txt
ls val/*.jpg > val.txt
Create an index file index.txt
for summarizing all information about the dataset with the following contents,
classes = 3
train = train.txt
valid = val.txt
names = classes.txt
backup = backup/
where classes
contains the number of classes to detect, train
and valid
point to the of training and validation image listings, names
points to the file with the class names and backup
names the folder, the model weights are stored during training. Finally our dataset folder structure should be similar to the following.
street_views_yolo/
├── classes.txt
├── index.txt
├── train/
│ ├── street1.jpg
│ ├── street1.txt
│ ├── ...
│ ├── street50.jpg
│ └── street50.txt
├── train.txt
├── val/
│ ├── street2.jpg
│ ├── street2.txt
│ ├── ...
│ ├── street10.jpg
│ └── street10.txt
├── val.txt
└── yolov4-custom.cfg
Train
We do not train the whole model from the beginning but further use an already trained version and fine-tune that model for our use case. This is called transfer learning and reduce the training time a lot.
I want to train a YOLOv4 model with above dataset. Regarding the training manual, some layers of the pretrained model needs to be changed to fit the number of custom classes to detect. There is already a written config file for training YOLOv4 with a custom dataset yolov4-custom.cfg
inside the cfg
folder we use and modify.
Configure the network
First copy the file yolov4-custom.cfg
into the dataset folder with the following command.
cp cfg/yolov4-custom.cfg data/street_views_yolo/.
Then customize the lines of the copied yolov4-custom.cfg
as shown in the training manual. Basically just search with the keyword yolo
for the three YOLO-layers in the config file. Then in each of the three [yolo]
layers, change classes=80
into classes=3
for our three classes dataset. Above each [yolo]
layer is a [convolutional]
layer where we change the filters=255
into filters=24
((classes + 5) * 3). Finally if you want, you can change the image input width
and height
in the [net]
layer by changing width=608
and height=608
into a size which can be divided by 32 (e.g. width=416
and height=416
).
Download the pretrained weights file
For YOLOv4, download the pretrained weights file yolov4.conv.137
and save it in the parent data
folder, so we can reuse the pretrained weights file when training with other datasets as well. The file and folder structure should be similar to the following.
darknet/
├── build/
├── cfg/
│ ├── ...
│ └── yolov4-custom.cfg
├── darknet
├── ...
└── data/
├── ...
├── street_views_yolo/
│ ├── classes.txt
│ ├── index.txt
│ ├── train/
│ ├── train.txt
│ ├── val/
│ ├── val.txt
│ └── yolov4-custom.cfg
└── yolov4.conv.137
Start training
In order for darknet
to find the CUDA and cuDNN libraries installed into the conda environment, we temporary add (only in current terminal session) the environment to the library path in the terminal (again using the CONDA_PREFIX variable).
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
The paths to the images in the dataset needs to be relative to the darknet
executable. Currently the paths are relative to the street_views_yolo
dataset folder but the darknet
executable is located in the repository root.
In order for the executable to find the dataset images regarding the dataset index file, we run it inside the dataset folder. First move into the dataset folder
cd data/street_views_yolo
and then start the training by running the executable relative to the dataset folder with the following command.
../../darknet detector train \
index.txt \
yolov4-custom.cfg \
../yolov4.conv.137 \
-dont_show -map -mjpeg_port 8090
I you got an error during training execution, please see the Troubleshoot section for possible solutions.
The parameter -mjpeg
allow to verify the training progress by accessing the URL http://ip-address:8090
in a browser showing the following graph.
The blue line represents the loss and the red line the mean average precision.
Get the training results
After the training is finished, the results are stored in the backup
folder inside the dataset.
street_views_yolo/
└── backup/
├── yolov4-custom_10000.weights
├── yolov4-custom_best.weights
├── yolov4-custom_final.weights
└── yolov4-custom_last.weights
yolov4-custom_best.weights
should contain the trained weights with the highest mean average precision.
Cleanup
To cleanup all the installed packages and libraries like CUDA, cuDNN and OpenCV just remove the created conda environment with the following command.
conda env remove -n YOLOv4_conda
And the delete locally cloned darknet
repository folder.
That's it.
Troubleshoot
Building Darknet
Problem: Building Darknet stops with the folllowing error.
No package 'opencv' found
/usr/bin/ld: cannot find -lcudart
/usr/bin/ld: cannot find -lcublas
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
make: *** [Makefile:176: darknet] Error 1
Solution: Add the conda environment to the PKG_CONFIG_PATH
with the following command. This setting disappears when the current terminal session is closed. Remember to run the command again on an new session.
export PKG_CONFIG_PATH=$CONDA_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH
Training Darknet
Problem: When start training, the following error occurs.
error while loading shared libraries: libopencv_dnn.so.<version>: cannot open shared object file: No such file or directory
Solution: Add the conda environment to the LD_LIBRARY_PATH
with the following command. This setting disappears when the current terminal session is closed. Remember to run the command again on an new session.
export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH
Problem: When start training, the following error occurs.
Error in load_data_detection() - OpenCV Cannot load image <path_to_image>
Solution: The image path in train.txt
or val.txt
is not correctly set. Please verify that the folder structure you store the images in matches the image paths in both files. The path needs to be relative to the dataset folder.