More than 5 years have passed since last update.

SBQ

YOLOv3のソースコードを読み解く～detector train編～ ②

Last updated at 2019-11-07Posted at 2019-11-07

はじめに

「YOLOv3のソースコードを読み解く～detector train編～① 」の続きです。
ネットワーク読み込みから続きます。

引用元

下記に配置してあるソースコード（2019年11月2日時点）をもとに読み解きます。
https://github.com/pjreddie/darknet

train_detectorつづき

cuda_set_deviceでGPUを登録するところまで読みました。
次にload_networkでネットワークの読み込みを行ってます。引数は、設定ファイルパスと重みファイルパスとclearの３つです。

detector.c(train_detectorつづき)

    for(i = 0; i < ngpus; ++i){
        srand(seed);
# ifdef GPU
        cuda_set_device(gpus[i]);
# endif
        nets[i] = load_network(cfgfile, weightfile, clear);
        nets[i]->learning_rate *= ngpus;
    }

load_networkでは、まず最初に、設定ファイルを解析するための関数であるparse_network_cfgが呼び出されています。引数は、設定ファイルパスです。

network.c(load_network)

network *load_network(char *cfg, char *weights, int clear)
{
    network *net = parse_network_cfg(cfg);

設定ファイルはどんな形をしているかを見ておきましょう。

yolov3-voc.cfg

[net]
# Testing
 batch=1
 subdivisions=1
# Training
# batch=64
# subdivisions=16
width=416
height=416
channels=3
momentum=0.9
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

セクションとキーと値という形をしているようです。
parse_network_cfgは結構長めの関数ですので、設定ファイルの構造を頭に置きながら、順を追って読み解いていきましょう。
まず最初に、指定された設定ファイルのファイルパスをread_cfg関数に指定して呼び出しています。戻り値はlist構造体のポインタです。

parser.c(parse_network_cfg)

network *parse_network_cfg(char *filename)
{
    list *sections = read_cfg(filename);

「①」では、read_data_cfgが出てきました。今回との違いは、セクションの指定の有り無しの部分です。最初にfopenで設定ファイルをオープンします。次に、make_list関数で、空のリストを構築しています。fgetl関数を使って、一行ずつファイルを読み込んでいます。

parser.c(read_cfg)

list *read_cfg(char *filename)
{
    FILE *file = fopen(filename, "r");
    if(file == 0) file_error(filename);
    char *line;
    int nu = 0;
    list *options = make_list();
    section *current = 0;
    while((line=fgetl(file)) != 0){

fgetl関数は、utils.cで定義された関数で、fgetsで512バイトずつ取得しながら、改行コードまでを一行として返却する関数です。今どきの言語では標準でこういった機能の関数が用意されていますが、c言語では、自作しなければいけません。とても参考になるコードですので必要に応じて確認してみてください。

続いて、nuをインクリメントしています。このnuは、エラー発生時のメッセージに使用しています。
strip関数でlineから空白を除去し、行の一行目をline[0]で確認しています。

parser.c(read_cfgつづき)

        ++ nu;
        strip(line);
        switch(line[0]){

まず最初は[から始まる場合です。
section構造体のメモリ領域をmallocで確保し、optionsリストに確保したcurrentをlist_insertで挿入しています。

parser.c(read_cfgつづき)

            case '[':
                current = malloc(sizeof(section));
                list_insert(options, current);

section構造体は以下のように定義されています。種別と、リストを保持することができます。

parser.c

typedef struct{
    char *type;
    list *options;
}section;

typeには現在の行の値、つまり、セクション名を設定し、optionsにはmake_list関数で、そのセクションのオプションを保持するためのリストを作成しています。

parser.c(read_cfgつづき)

                current->options = make_list();
                current->type = line;
                break;

空行やコメント行はスキップします。

parser.c(read_cfgつづき)

            case '\0':
            case '#':
            case ';':
                free(line);
                break;

上記以外のものが来た場合は、read_option関数で、キーと値に分割し、それをsection構造体のcurrentのoptionsに設定しています。

parser.c(read_cfgつづき)

            default:
                if(!read_option(line, current->options)){
                    fprintf(stderr, "Config file error line %d, could parse: %s\n", nu, line);
                    free(line);
                }
                break;
        }
    }
    fclose(file);
    return options;
}

つまり、セクションのリストの中に、オプションのリストが保持されている階層の形になっているということですね。
これで、設定ファイルの読み込みは終わりです。

parse_network_cfgの処理に戻りましょう。
read_cfgで読み込んだセクションのうち、最初のセクションを取得し、node *nに設定しています。

parser.c(parse_network_cfgつづき)

    node *n = sections->front;
    if(!n) error("Config file has no sections");

make_networkでnetwork構造体を構築しています。

parser.c(parse_network_cfgつづき)

    network *net = make_network(sections->size - 1);

make_networkでは、network構造体の領域を確保し、各メンバ変数の領域を確保しています。net->nには、引数で指定されたsections->size - 1の値を設定しています。
設定ファイルは、最初の[net]セクションに、ネットワークの設定が記述されていて、それ以降は各レイヤーの設定が記述されています。なので、sections->sizeから１を引いた数が、ネットワークのレイヤーの数となります。

network.c

network *make_network(int n)
{
    network *net = calloc(1, sizeof(network));
    net->n = n;
    net->layers = calloc(net->n, sizeof(layer));
    net->seen = calloc(1, sizeof(size_t));
    net->t    = calloc(1, sizeof(int));
    net->cost = calloc(1, sizeof(float));
    return net;
}

net->gpu_indexにgpu_indexの値を設定しています。この、gpu_indexはいったいどこで定義されているのだろうと思い、探してみたところ、cuda.cで定義され、darknet.hでexternされていました。いわゆるグローバル変数ですね。

parser.c(parse_network_cfgつづき)

    net->gpu_index = gpu_index;

次に、size_params構造体を宣言しています。

parser.c(parse_network_cfgつづき)

    size_params params;

size_params構造体は、以下の通り定義されています。

parser.c

typedef struct size_params{
    int batch;
    int inputs;
    int h;
    int w;
    int c;
    int index;
    int time_steps;
    network *net;
} size_params;

このparamsは、この後のレイヤーの解析の時に呼び出す関数の引数にするようです。各メンバ変数は、設定ファイルのnetセクションとほぼほぼ対応しているようです。

n->valで、リストの最初のsection構造体を取得し、その中からoptionsリストを参照しています。

parser.c(parse_network_cfgつづき)

    section *s = (section *)n->val;
    list *options = s->options;
    if(!is_network(s)) error("First section must be [net] or [network]");
    parse_net_options(options, net);

それから、parse_net_options関数で、options構造体を解析し、値をnetに設定しています。
ということで、parse_net_options関数を見てみましょう。想像していたよりながい！
netセクションから読み込んだオプションを、キーを指定して取得し、引数のnetに設定しています。設定ファイルに指定がない場合は、デフォルト値を設定しています。

parser.c

void parse_net_options(list *options, network *net)
{
    net->batch = option_find_int(options, "batch",1);
    net->learning_rate = option_find_float(options, "learning_rate", .001);
    net->momentum = option_find_float(options, "momentum", .9);
    net->decay = option_find_float(options, "decay", .0001);
    int subdivs = option_find_int(options, "subdivisions",1);
    net->time_steps = option_find_int_quiet(options, "time_steps",1);
    net->notruth = option_find_int_quiet(options, "notruth",0);
    net->batch /= subdivs;
    net->batch *= net->time_steps;
    net->subdivisions = subdivs;
    net->random = option_find_int_quiet(options, "random", 0);

    net->adam = option_find_int_quiet(options, "adam", 0);
    if(net->adam){
        net->B1 = option_find_float(options, "B1", .9);
        net->B2 = option_find_float(options, "B2", .999);
        net->eps = option_find_float(options, "eps", .0000001);
    }

    net->h = option_find_int_quiet(options, "height",0);
    net->w = option_find_int_quiet(options, "width",0);
    net->c = option_find_int_quiet(options, "channels",0);
    net->inputs = option_find_int_quiet(options, "inputs", net->h * net->w * net->c);
    net->max_crop = option_find_int_quiet(options, "max_crop",net->w*2);
    net->min_crop = option_find_int_quiet(options, "min_crop",net->w);
    net->max_ratio = option_find_float_quiet(options, "max_ratio", (float) net->max_crop / net->w);
    net->min_ratio = option_find_float_quiet(options, "min_ratio", (float) net->min_crop / net->w);
    net->center = option_find_int_quiet(options, "center",0);
    net->clip = option_find_float_quiet(options, "clip", 0);

    net->angle = option_find_float_quiet(options, "angle", 0);
    net->aspect = option_find_float_quiet(options, "aspect", 1);
    net->saturation = option_find_float_quiet(options, "saturation", 1);
    net->exposure = option_find_float_quiet(options, "exposure", 1);
    net->hue = option_find_float_quiet(options, "hue", 0);

    if(!net->inputs && !(net->h && net->w && net->c)) error("No input parameters supplied");

    char *policy_s = option_find_str(options, "policy", "constant");
    net->policy = get_policy(policy_s);
    net->burn_in = option_find_int_quiet(options, "burn_in", 0);
    net->power = option_find_float_quiet(options, "power", 4);

`option_find_int_quiet'等、末尾にquietがついている関数がいくつか記述されています。これらの関数がquietなしの関数と何が異なるかというと、デフォルト値を使用したときにログを出力するかどうかが異なっています。おそらくですが、もともと設定が任意でデフォルト値を使用することのほうが多い項目に関しては、quiet付きの関数を使用しているのだと思います。

option_list.c

int option_find_int_quiet(list *l, char *key, int def)
{
    char *v = option_find(l, key);
    if(v) return atoi(v);
    return def;
}

各値に関しては、今後実際に使用する際に出てくると思いますので、その時に詳しく見たいと思います。だいたい、キーの名前を見れば何をするものなのかはお分かりになるとは思います。

後半は、net->policyに指定された値によって、読み込むものを変えています。policyは学習率の決定にかかわる設定で、例えば、constantだと一定の値となりますが、stepを指定すると学習率が徐々に減衰していくような値となります。こちらも実際に使用されるときに出てくると思います。

parser.c(parse_net_optionsつづき)

    if(net->policy == STEP){
        net->step = option_find_int(options, "step", 1);
        net->scale = option_find_float(options, "scale", 1);
    } else if (net->policy == STEPS){
        char *l = option_find(options, "steps");
        char *p = option_find(options, "scales");
        if(!l || !p) error("STEPS policy must have steps and scales in cfg file");

        int len = strlen(l);
        int n = 1;
        int i;
        for(i = 0; i < len; ++i){
            if (l[i] == ',') ++n;
        }
        int *steps = calloc(n, sizeof(int));
        float *scales = calloc(n, sizeof(float));
        for(i = 0; i < n; ++i){
            int step    = atoi(l);
            float scale = atof(p);
            l = strchr(l, ',')+1;
            p = strchr(p, ',')+1;
            steps[i] = step;
            scales[i] = scale;
        }
        net->scales = scales;
        net->steps = steps;
        net->num_steps = n;
    } else if (net->policy == EXP){
        net->gamma = option_find_float(options, "gamma", 1);
    } else if (net->policy == SIG){
        net->gamma = option_find_float(options, "gamma", 1);
        net->step = option_find_int(options, "step", 1);
    } else if (net->policy == POLY || net->policy == RANDOM){
    }
    net->max_batches = option_find_int(options, "max_batches", 0);
}

parse_network_cfgに戻ります。
parse_net_optionsで解析してnet構造体に設定した値を、params構造体に設定しています。

parser.c(parse_network_cfgつづき)

    params.h = net->h;
    params.w = net->w;
    params.c = net->c;
    params.inputs = net->inputs;
    params.batch = net->batch;
    params.time_steps = net->time_steps;
    params.net = net;

これで、レイヤーの解析関数に渡すためのパラメータの準備ができました。
なので、n->nextで次のセクションを参照しながら、レイヤーの設定の解析を行っていきます。
while(n)で、nextがなくなるまで処理を繰り返しています。

parser.c(parse_network_cfgつづき)

    size_t workspace_size = 0;
    n = n->next;
    int count = 0;
    free_section(s);
    fprintf(stderr, "layer     filters    size              input                output\n");
    while(n){

params.indexには何番目のレイヤーなのかを設定し、optionsに、セクションは以下のオプションを取得しています。そして、s->typeには、セクション名が入っているので、そのセクション名文字列をLAYER_TYPEに変換して次の行からの判定に備えています。
ちなみに、LAYER_TYPEはenumとしてdarknet.hで定義されています。

parser.c(parse_network_cfgつづき)

        params.index = count;
        fprintf(stderr, "%5d ", count);
        s = (section *)n->val;
        options = s->options;
        layer l = {0};
        LAYER_TYPE lt = string_to_layer_type(s->type);

せっかくenum値に変換したので、switch文で判定するかと思いきや、if文で判定していますね。最初は、CONVOLUTIONALつまり畳み込み層かどうかを判定しています。畳み込み層の場合は、parse_convolutional関数を呼び出しています。

parser.c(parse_network_cfgつづき)

        if(lt == CONVOLUTIONAL){
            l = parse_convolutional(options, params);

parse_convolutional関数では何をしているかというと、まずは、設定ファイルから設定値を取得しています。

parser.c(parse_convolutional)

convolutional_layer parse_convolutional(list *options, size_params params)
{
    int n = option_find_int(options, "filters",1);
    int size = option_find_int(options, "size",1);
    int stride = option_find_int(options, "stride",1);
    int pad = option_find_int_quiet(options, "pad",0);
    int padding = option_find_int_quiet(options, "padding",0);
    int groups = option_find_int_quiet(options, "groups", 1);
    if(pad) padding = size/2;

畳み込み層で特徴的な設定である、フィルタ数やストライド、パディングなどを設定から読み込んでいます。そのあとに、活性化関数の定義を読み込んで、get_activation関数で、ACTIVATIONで定義されたenum値に変換しています。だんだんとDeep Learningっぽくなってきました。ワクワクします。

parser.c(parse_convolutionalつづき)

    char *activation_s = option_find_str(options, "activation", "logistic");
    ACTIVATION activation = get_activation(activation_s);

hやwなどを引数で渡されたパラーメータから取得し、ほかいくつかを設定から取得して、今まで取得した値を引数にmake_convolutional_layerを呼び出しています。

parser.c(parse_convolutionalつづき)

    int batch,h,w,c;
    h = params.h;
    w = params.w;
    c = params.c;
    batch=params.batch;
    if(!(h && w && c)) error("Layer before convolutional layer must output image.");
    int batch_normalize = option_find_int_quiet(options, "batch_normalize", 0);
    int binary = option_find_int_quiet(options, "binary", 0);
    int xnor = option_find_int_quiet(options, "xnor", 0);

    convolutional_layer layer = make_convolutional_layer(batch,h,w,c,n,groups,size,stride,padding,activation, batch_normalize, binary, xnor, params.net->adam);

make_convolutional_layerでは畳み込み層を構築して、convolutional_layerを返却しています。
convolutional_layerは、convolutional_layer.hで以下のように定義されています。

convolutional_layer.h

typedef layer convolutional_layer;

つまり、実態はlayer構造体というこうとですね。layer構造体はdarknet.hで定義されています。かなり長いのでここには掲載しませんが、構造を確認してみてください。

関数の最初で、convolutional_layerを定義し、種別にCONVOLUTIONALを定義しています。

convolutional_layer.c(make_convolutional_layer)

convolutional_layer make_convolutional_layer(int batch, int h, int w, int c, int n, int groups, int size, int stride, int padding, ACTIVATION activation, int batch_normalize, int binary, int xnor, int adam)
{
    int i;
    convolutional_layer l = {0};
    l.type = CONVOLUTIONAL;

続いて、レイヤーに各種値を設定しています。
groupsはグループ化畳み込みの設定です。

convolutional_layer.c(make_convolutional_layerつづき)

    l.groups = groups;

グループ化畳み込み(grouped convolution)は、入力層をレイヤー方向にグループ分割してそれを畳み込みし、最後に結合して出力とします。
ResNeXtはグループ化畳み込みを採用することで、ResNetと比較して表現力とパラメータ数のトレードオフが改善され、同等のパラメータ数で精度向上を実現することができたそうです。

続いて、画像サイズとチャネル数をレイヤーに設定しています。nはフィルター数を設定しています。

convolutional_layer.c(make_convolutional_layerつづき)

    l.h = h;
    l.w = w;
    l.c = c;
    l.n = n;

binaryは全2値化畳み込みの指定です。xnorはXNOR-Netの指定です。両値ともデフォルト値は０です。
batchは[net]セクションで指定されたbatchの値を設定しています。
strideはストライド、sizeはフィルタのサイズ、padはパディングをそれぞれ設定しています。
batch_normalizeはバッチ正規化の指定です。デフォルト値は０です。

convolutional_layer.c(make_convolutional_layerつづき)

    l.binary = binary;
    l.xnor = xnor;
    l.batch = batch;
    l.stride = stride;
    l.size = size;
    l.pad = padding;
    l.batch_normalize = batch_normalize;

続いて、重みのサイズを計算し、メモリ領域を確保しています。
重み = チャネル数 ÷ グループ数 × フィルター数 × フィルターサイズ × フィルターサイズ
で計算されています。
同時に、更新重みに関しても同じサイズでメモリ領域を確保しています。

convolutional_layer.c(make_convolutional_layerつづき)

    l.weights = calloc(c/groups*n*size*size, sizeof(float));
    l.weight_updates = calloc(c/groups*n*size*size, sizeof(float));

バイアスの領域も確保しています。バイアスは、フィルター数分メモリ領域を確保しています。こちらも更新用も同時に確保しています。
そして、重みとバイアスそれぞれのサイズを保持しています。

convolutional_layer.c(make_convolutional_layerつづき)

    l.biases = calloc(n, sizeof(float));
    l.bias_updates = calloc(n, sizeof(float));

    l.nweights = c/groups*n*size*size;
    l.nbiases = n;

続いて、重みを初期化しています。scaleで重みの範囲を定義し、それにrand_normalの値をかけて初期化しています。

convolutional_layer.c(make_convolutional_layerつづき)

    // float scale = 1./sqrt(size*size*c);
    float scale = sqrt(2./(size*size*c/l.groups));
    //printf("convscale %f\n", scale);
    //scale = .02;
    //for(i = 0; i < c*n*size*size; ++i) l.weights[i] = scale*rand_uniform(-1, 1);
    for(i = 0; i < l.nweights; ++i) l.weights[i] = scale*rand_normal();

rand_normalに関しては下記URLを参照するようコメントに記述されています。
http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform

utils.c

// From http://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform
float rand_normal()
{
...
}

rand_normalはボックス＝ミュラー法と呼ばれる方法で、一様分布に従う確率変数から標準正規分布に従う確率変数を生成させています。正規分布に従う擬似乱数を発生することができます。
なので、重みは、ボックスミューラー法で生成された正規分布に従う擬似乱数で初期化されます。初期化の範囲は、scaleの値です。
scaleは、コメントを見る限りいろいろと試行錯誤されたみたいですが、最終的には、フィルターのサイズの逆数に2をかけたものの平方根が使用されています。これは、Heの初期値の形ですかね。

続いて、convolutional_out_widthとconvolutional_out_heightで出力サイズを計算し、その結果をレイヤーに設定しています。

convolutional_layer.c(make_convolutional_layerつづき)

    int out_w = convolutional_out_width(l);
    int out_h = convolutional_out_height(l);
    l.out_h = out_h;
    l.out_w = out_w;
    l.out_c = n;
    l.outputs = l.out_h * l.out_w * l.out_c;
    l.inputs = l.w * l.h * l.c;

出力サイズの計算ですが、下記のように、幅/高さとパディングとフィルタサイズとストライドで計算しています。これはCNNではおなじみの計算方法ですね。

convolutional_layer.c

int convolutional_out_height(convolutional_layer l)
{
    return (l.h + 2*l.pad - l.size) / l.stride + 1;
}

int convolutional_out_width(convolutional_layer l)
{
    return (l.w + 2*l.pad - l.size) / l.stride + 1;
}

続いて、出力用とデルタ領域用にメモリを確保しています。そして、順伝播、逆伝播、更新用の関数のポインタをレイヤーに設定しています。このあたりの関数は、後ほど使われるときに読み解いていきます。

convolutional_layer.c(make_convolutional_layerつづき)

    l.output = calloc(l.batch*l.outputs, sizeof(float));
    l.delta  = calloc(l.batch*l.outputs, sizeof(float));

    l.forward = forward_convolutional_layer;
    l.backward = backward_convolutional_layer;
    l.update = update_convolutional_layer;

続いて、全2値化畳み込み、XNOR-Net、バッチ正規化、adamの指定がある場合にそこで利用するメモリの確保をしています。メモリ確保だけだったので、ソースコードは一部省略しています。

convolutional_layer.c(make_convolutional_layerつづき)

    if(binary){
        l.binary_weights = calloc(l.nweights, sizeof(float));
        l.cweights = calloc(l.nweights, sizeof(char));
        l.scales = calloc(n, sizeof(float));
    }
    if(xnor){
        l.binary_weights = calloc(l.nweights, sizeof(float));
        l.binary_input = calloc(l.inputs*l.batch, sizeof(float));
    }

    if(batch_normalize){
        ・・・
    }
    if(adam){
        ・・・
    }

個々からは、GPUを利用する場合のみに実行されるコードです。
まず最初にGPU指定かどうかを判断して、GPU用の順伝播、逆伝播、更新用の関数のポインタをレイヤーに設定しています。

convolutional_layer.c(make_convolutional_layerつづき)

# ifdef GPU
    l.forward_gpu = forward_convolutional_layer_gpu;
    l.backward_gpu = backward_convolutional_layer_gpu;
    l.update_gpu = update_convolutional_layer_gpu;

そして、全2値化畳み込み、XNOR-Net、バッチ正規化、adamの指定がある場合にそこで利用するメモリの確保をしていますが、GPUではない場合は、callocで領域を確保していましたが、GPUの場合は、cuda_make_arrayという関数でメモリを確保しています。

convolutional_layer.c(make_convolutional_layerつづき)

    if(gpu_index >= 0){
        if (adam) {
            l.m_gpu = cuda_make_array(l.m, l.nweights);
            l.v_gpu = cuda_make_array(l.v, l.nweights);
            l.bias_m_gpu = cuda_make_array(l.bias_m, n);
            l.bias_v_gpu = cuda_make_array(l.bias_v, n);
            l.scale_m_gpu = cuda_make_array(l.scale_m, n);
            l.scale_v_gpu = cuda_make_array(l.scale_v, n);
        }

        l.weights_gpu = cuda_make_array(l.weights, l.nweights);
        l.weight_updates_gpu = cuda_make_array(l.weight_updates, l.nweights);

        l.biases_gpu = cuda_make_array(l.biases, n);
        l.bias_updates_gpu = cuda_make_array(l.bias_updates, n);

        l.delta_gpu = cuda_make_array(l.delta, l.batch*out_h*out_w*n);
        l.output_gpu = cuda_make_array(l.output, l.batch*out_h*out_w*n);

        if(binary){
            l.binary_weights_gpu = cuda_make_array(l.weights, l.nweights);
        }
        if(xnor){
            ・・・
        }

        if(batch_normalize){
            ・・・
        }

cuda_make_arrayは、内部で、cudaMallocを呼び出してメモリを確保しています。cudaMallocはCUDA TOOLKIT で定義された関数で、CUDAのデバイスメモリを確保します。そして、cudaMemcpyで、引数に指定された値をその領域にコピーしています。

cuda.c

float *cuda_make_array(float *x, size_t n)
{
    float *x_gpu;
    size_t size = sizeof(float)*n;
    cudaError_t status = cudaMalloc((void **)&x_gpu, size);
    check_error(status);
    if(x){
        status = cudaMemcpy(x_gpu, x, size, cudaMemcpyHostToDevice);
        check_error(status);
    } else {
        fill_gpu(n, 0, x_gpu, 1);
    }
    if(!x_gpu) error("Cuda malloc failed\n");
    return x_gpu;
}

続いて、cuDNNの場合の処理です。cudnnCreateTensorDescriptor関数を呼び出して、descriptorを構築し、レイヤーのそれぞれに設定しています。cudnnCreateFilterDescriptorなども呼び出しながら、最終的に、cudnn_convolutional_setupで各descriptorをセットアップしています。

convolutional_layer.c(make_convolutional_layerつづき)

# ifdef CUDNN
        cudnnCreateTensorDescriptor(&l.normTensorDesc);
        cudnnCreateTensorDescriptor(&l.srcTensorDesc);
        cudnnCreateTensorDescriptor(&l.dstTensorDesc);
        cudnnCreateFilterDescriptor(&l.weightDesc);
        cudnnCreateTensorDescriptor(&l.dsrcTensorDesc);
        cudnnCreateTensorDescriptor(&l.ddstTensorDesc);
        cudnnCreateFilterDescriptor(&l.dweightDesc);
        cudnnCreateConvolutionDescriptor(&l.convDesc);
        cudnn_convolutional_setup(&l);

cudnn_convolutional_setupでは、cudnnSetTensor4dDescriptorでdescriptorにバッチ数や画像サイズやチャネル数や型などを設定しています。また、filterに関しては、cudnnSetFilter4dDescriptorで、フィルタ数やフィルタサイズなどを設定しています。

make_convolutional_layerの最後に、workspace_sizeを計算し、activationを設定します。
ログを出力してレイヤーを返却して終了します。

convolutional_layer.c(make_convolutional_layerつづき)

# endif
    }
# endif
    l.workspace_size = get_workspace_size(l);
    l.activation = activation;

    fprintf(stderr, "conv  %5d %2d x%2d /%2d  %4d x%4d x%4d   ->  %4d x%4d x%4d  %5.3f BFLOPs\n", n, size, size, stride, w, h, c, l.out_w, l.out_h, l.out_c, (2.0 * l.n * l.size*l.size*l.c/l.groups * l.out_h*l.out_w)/1000000000.);

    return l;
}

つづく

畳み込みレイヤーが終わり、きりがいいのでいったんきります。
中身を見ると、実際にどう実装しているかを見ることができるのでとても勉強になりました。
CUDAやcuDNNもでてきて、面白くなってきました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

YOLOv3のソースコードを読み解く ～detector train編～ ②

はじめに

引用元

train_detectorつづき

つづく

YOLOv3のソースコードを読み解く～detector train編～ ②