More than 1 year has passed since last update.

YOLOv4 - Quick setup with conda and GPU training

Last updated at 2022-09-13Posted at 2022-04-28

Dependencies only installed in conda environment - no system wide CUDA-Toolkit installation

日本語版 (Japanese Version)

Preface

I wanted to train a YOLOv4 model for fast object detection on an edge device and looked for the official YOLOv4 repository which is a fork from the YOLOv3 Darknet repository.
Darknet is a neural network framework written in C and CUDA. The setup for GPU is not explicitly guided but linked to the official OpenCV and CUDA-Toolkit websites which mainly provide setup guides for system-wide installation of OpenCV and CUDA-Toolkit. Because I use several neural network frameworks (e.g. Pytorch, Tensorflow 1 and 2), I seperate the setup on my PC in environments using Anaconda to avoid system wide installations which can cause version conflicts. The uninstallation of system wide installed packages are not unified - especially if not installed with package manager - and you need to invest time for correct cleanup afterwards. I could not find any guide about setting up Darknet for GPU training with conda environment, I decided to write this guide in order to help people, who might be in the same situation.

About conda

As described on the official webpage, conda is a package, dependency and environment management tool for many programming languages. In case of python - in contrast to pip - conda allows you to install not only python libraries and their dependencies into an environment, but also dependent packages written in other languages like the CUDA-Toolkit or the cuDNN library. Because the packages are installed into an environment, that gives you the benefit to have several versions of a package separated by environments and you can switch between them. Removing the packages in conda is clean and fast without the need of uninstallation scripts. Just removing the specific environment with the following command is enough.

conda env remove -n <environment_name>

Setup

I tested the setup on Ubuntu 20.04 LTS with Miniconda installed. The setup is actually quite easy and takes only 10 minutes - depending on your internet connection - and requires a minimum of 2GB for the CUDA-Toolkit and cuDNN library. The requirements for building the Darknet neural network framework contains OpenCV which we also install using conda.

Create conda environment

We create a conda environment YOLOv4_conda and install python 3.7 as well as OpenCV 3.x with the following command in the terminal.

conda create -n YOLOv4_conda python=3.7 opencv=3

We install the CUDA-Toolkit and the cuDNN library from the nvidia channel with the following terminal command as explained in the CUDA-Toolkit documentation.

conda install cuda cudnn -c nvidia -n YOLOv4_conda

The cuda package installs the CUDA-Toolkit and the NVCC (NVIDIA CUDA Compiler) to build the Darknet CUDA code to run on GPU.

There is a cuda-toolkit package available as well in the nvidia channel and generally used for GPU training on python based machine learning frameworks like Tensorflow or Pytorch, but this package only contains the cuda-toolkit and do NOT include the NVCC.

Finally we switch into our newly created environment.

conda activate YOLOv4_conda

Clone the Darknet YOLOv4 project

Clone and switch into the repository in the terminal.

git clone https://github.com/AlexeyAB/darknet.git
cd darknet

Check the version we cloned in the terminal for future environment setups.

git --no-pager log --oneline -1

Returns the following commit.

8a0bf84 (HEAD -> master, origin/master, origin/HEAD) various fixes (#8398)

Configure Darknet

Load the Makefile into a text editor and modify the following three parts.

Line 1-4: Enable GPU build, OpenCV and cuDNN


GPU=1
CUDNN=1
CUDNN_HALF=1
OPENCV=1

Line 27-58: Uncomment the compute capability of your GPU (e.g to train with the RTX 3090 GPU, uncomment the following)


`#` GeForce RTX 3070, 3080, 3090
ARCH= -gencode arch=compute_86,code=[sm_86,compute_86]

Line 80: Add the path to header files installed in the conda environment (path is stored in the CONDA_PREFIX variable)


COMMON= -Iinclude/ -I3rdparty/stb/include -I$(CONDA_PREFIX)/include

Build Darknet

In order for pkgconfig to find the packages required to build Darknet in our environment, we temporary add (only in current terminal session) the environment to the searchpath in the terminal (again using the CONDA_PREFIX variable).

export PKG_CONFIG_PATH=$CONDA_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH

Run the build command.

make

If an error occurs during build, the build process stops with the last line similar to the following.

make: *** [Makefile:176: darknet] Error 1

Please see the Troubleshoot section for possible solutions to fix build errors.

Prepare Dataset

Regarding the Darknet documentation, the Dataset should be created in the build/darknet/x64/data folder, but can actually be placed anywhere inside the darknet repository folder. There are some inconsistencies in the documentation explaning the location of the ./darknet executable and the folder to store dataset. In my case the ./darknet executable is located in the repository root folder and regarding to that, I create my dataset in the folder data in the repository root folder. For testing purposes I use a tiny custom dataset street_views_yolo, with images collected from the COCO dataset.

Annotate

Create a file classes.txt inside the dataset folder containing all class names in the dataset. In our test dataset, we use 3 classes which results in a classes.txt file with the following content.

car
bus
person

The image annotations are stored in a file with the same filename as the image filename but with .txt ending. Save the annotations in the same folder the images are stored. The annotation format has the following structure with 1 bounding box per line,

<object-class> <x_center> <y_center> <width> <height>

where <object-class> is the class number in the classes.txt file starting from 0 (car). <x_center> and <y_center> contain the bounding box center point normalized between 0 and 1 (divided by the image width or image height respectively). <width> and <height> are the bounding box width and height also normalized between 0 and 1.

As an example, the annotations for an image street1.jpg with 2 bounding boxes (car and bus) are stored in the file street1.txt with the following content

0 0.5072916666666667 0.54453125 0.43125 0.3484375
1 0.5135416666666667 0.2875 0.40625 0.26875

The annotation can be done by using annotation tools like Labelimg providing already the Darknet annotation format. The installation is easy with conda using the following command.

conda install labelimg -c conda-forge

Split Dataset

Training and validation images are stored in the folders train and val.
A listing file train.txt for listing all training images and a val.txt for all validation images is required. The listing (e.g. for JPEG images) can be created with the following commands.

ls train/*.jpg > train.txt
ls val/*.jpg > val.txt

Create an index file index.txt for summarizing all information about the dataset with the following contents,

classes = 3
train  = train.txt
valid  = val.txt
names = classes.txt
backup = backup/

where classes contains the number of classes to detect, train and valid point to the of training and validation image listings, names points to the file with the class names and backup names the folder, the model weights are stored during training. Finally our dataset folder structure should be similar to the following.

street_views_yolo/
├── classes.txt
├── index.txt
├── train/
│   ├── street1.jpg
│   ├── street1.txt
│   ├── ...
│   ├── street50.jpg
│   └── street50.txt
├── train.txt
├── val/
│   ├── street2.jpg
│   ├── street2.txt
│   ├── ...
│   ├── street10.jpg
│   └── street10.txt
├── val.txt
└── yolov4-custom.cfg

Train

We do not train the whole model from the beginning but further use an already trained version and fine-tune that model for our use case. This is called transfer learning and reduce the training time a lot.
I want to train a YOLOv4 model with above dataset. Regarding the training manual, some layers of the pretrained model needs to be changed to fit the number of custom classes to detect. There is already a written config file for training YOLOv4 with a custom dataset yolov4-custom.cfg inside the cfg folder we use and modify.

Configure the network

First copy the file yolov4-custom.cfg into the dataset folder with the following command.

cp cfg/yolov4-custom.cfg data/street_views_yolo/.

Then customize the lines of the copied yolov4-custom.cfg as shown in the training manual. Basically just search with the keyword yolo for the three YOLO-layers in the config file. Then in each of the three [yolo] layers, change classes=80 into classes=3 for our three classes dataset. Above each [yolo] layer is a [convolutional] layer where we change the filters=255 into filters=24 ((classes + 5) * 3). Finally if you want, you can change the image input width and height in the [net] layer by changing width=608 and height=608 into a size which can be divided by 32 (e.g. width=416 and height=416).

Download the pretrained weights file

For YOLOv4, download the pretrained weights file yolov4.conv.137 and save it in the parent data folder, so we can reuse the pretrained weights file when training with other datasets as well. The file and folder structure should be similar to the following.

darknet/
├── build/
├── cfg/
│   ├── ...
│   └── yolov4-custom.cfg
├── darknet
├── ...
└── data/
    ├── ...
    ├── street_views_yolo/
    │   ├── classes.txt
    │   ├── index.txt
    │   ├── train/
    │   ├── train.txt
    │   ├── val/
    │   ├── val.txt
    │   └── yolov4-custom.cfg
    └── yolov4.conv.137

Start training

In order for darknet to find the CUDA and cuDNN libraries installed into the conda environment, we temporary add (only in current terminal session) the environment to the library path in the terminal (again using the CONDA_PREFIX variable).

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

The paths to the images in the dataset needs to be relative to the darknet executable. Currently the paths are relative to the street_views_yolo dataset folder but the darknet executable is located in the repository root.
In order for the executable to find the dataset images regarding the dataset index file, we run it inside the dataset folder. First move into the dataset folder

cd data/street_views_yolo

and then start the training by running the executable relative to the dataset folder with the following command.

../../darknet detector train \
  index.txt \
  yolov4-custom.cfg \
  ../yolov4.conv.137 \
  -dont_show -map -mjpeg_port 8090

I you got an error during training execution, please see the Troubleshoot section for possible solutions.

The parameter -mjpeg allow to verify the training progress by accessing the URL http://ip-address:8090 in a browser showing the following graph.

The blue line represents the loss and the red line the mean average precision.

Get the training results

After the training is finished, the results are stored in the backup folder inside the dataset.

street_views_yolo/
└── backup/
    ├── yolov4-custom_10000.weights
    ├── yolov4-custom_best.weights
    ├── yolov4-custom_final.weights
    └── yolov4-custom_last.weights

yolov4-custom_best.weights should contain the trained weights with the highest mean average precision.

Cleanup

To cleanup all the installed packages and libraries like CUDA, cuDNN and OpenCV just remove the created conda environment with the following command.

conda env remove -n YOLOv4_conda

And the delete locally cloned darknet repository folder.
That's it.

Troubleshoot

Building Darknet

Problem: Building Darknet stops with the folllowing error.

No package 'opencv' found
/usr/bin/ld: cannot find -lcudart
/usr/bin/ld: cannot find -lcublas
/usr/bin/ld: cannot find -lcurand
collect2: error: ld returned 1 exit status
make: *** [Makefile:176: darknet] Error 1

Solution: Add the conda environment to the PKG_CONFIG_PATH with the following command. This setting disappears when the current terminal session is closed. Remember to run the command again on an new session.

export PKG_CONFIG_PATH=$CONDA_PREFIX/lib/pkgconfig:$PKG_CONFIG_PATH

Training Darknet

Problem: When start training, the following error occurs.

error while loading shared libraries: libopencv_dnn.so.<version>: cannot open shared object file: No such file or directory

Solution: Add the conda environment to the LD_LIBRARY_PATH with the following command. This setting disappears when the current terminal session is closed. Remember to run the command again on an new session.

export LD_LIBRARY_PATH=$CONDA_PREFIX/lib:$LD_LIBRARY_PATH

Problem: When start training, the following error occurs.

 Error in load_data_detection() - OpenCV Cannot load image <path_to_image>

Solution: The image path in train.txt or val.txt is not correctly set. Please verify that the folder structure you store the images in matches the image paths in both files. The path needs to be relative to the dataset folder.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up