0
0

More than 3 years have passed since last update.

Caffe@WSLにてMNISTのTutorialを実行

Last updated at Posted at 2019-11-01

Caffe @ Ubuntu18.04 @ WSL 環境にてMNISTの実行してみる.
環境のビルドについてはこちらの記事に記述してあります.

まとめ

  • Ubuntu18.04 @ WSL では Caffe の MNIST Tutorial 標準の LMDB は動作しない
  • DB を LMDB から LevelDB に変更することで WSL 環境でも動作する
  • 補足: 何故かcifar10のtutorialはLMDBでも動作する

MNISTのTutorialの実行でハマった手順

mkdir examples
cd ./examples
cp -a ${CAFFE_ROOT}/caffe/{build,data,examples,.build_release} .
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
...
+ build/examples/mnist/convert_mnist_data.bin data/mnist/train-images-idx3-ubyte data/mnist/train-labels-idx1-ubyte examples/mnist/mnist_train_lmdb --backend=lmdb
I1031 23:51:00.930450 14783 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I1031 23:51:00.932895 14783 convert_mnist_data.cpp:88] A total of 60000 items.
I1031 23:51:00.932911 14783 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
F1031 23:51:00.953609 14783 db_lmdb.hpp:15] Check failed: mdb_status == 0 (-30796 vs. 0) MDB_CORRUPTED: Located page was wrong type
*** Check failure stack trace: ***
    @     0x7f31b661c0cd  google::LogMessage::Fail()
    @     0x7f31b661df33  google::LogMessage::SendToLog()
    @     0x7f31b661bc28  google::LogMessage::Flush()
    @     0x7f31b661e999  google::LogMessageFatal::~LogMessageFatal()
    @     0x7f31b6b4323d  caffe::db::LMDBTransaction::Commit()
    @     0x7f31b7203189  convert_dataset()
    @     0x7f31b720247a  main
    @     0x7f31b55f1b97  __libc_start_main
    @     0x7f31b72024ca  _start
./examples/mnist/create_mnist.sh: line 18: 14783 Aborted                 (core dumped) $BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte $DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}

エラーで落ちる。
CaffeのgithubによるとWSLで発生するエラーで別環境では出ないとのことだが、WSLでの回避方法は言及なし...
コードを見てるとエラーの発生元は${CAFFE}/src/caffe/util/db_lmdb.cpp内のLMDBTransaction::Commit()のMDB_CHECK(put_rc)の模様.
どうやら wsl での mdb_put の挙動が怪しそう. ( 情報元: ここここ )

void LMDBTransaction::Commit() {
  MDB_dbi mdb_dbi;
  MDB_val mdb_key, mdb_data;
  MDB_txn *mdb_txn;

  // Initialize MDB variables
  MDB_CHECK(mdb_txn_begin(mdb_env_, NULL, 0, &mdb_txn));
  MDB_CHECK(mdb_dbi_open(mdb_txn, NULL, 0, &mdb_dbi));

  for (int i = 0; i < keys.size(); i++) {
    mdb_key.mv_size = keys[i].size();
    mdb_key.mv_data = const_cast<char*>(keys[i].data());
    mdb_data.mv_size = values[i].size();
    mdb_data.mv_data = const_cast<char*>(values[i].data());

    // Add data to the transaction
    int put_rc = mdb_put(mdb_txn, mdb_dbi, &mdb_key, &mdb_data, 0);
    if (put_rc == MDB_MAP_FULL) {
      // Out of memory - double the map size and retry
      mdb_txn_abort(mdb_txn);
      mdb_dbi_close(mdb_env_, mdb_dbi);
      DoubleMapSize();
      Commit();
      return;
    }
    // May have failed for some other reason
    MDB_CHECK(put_rc);
  }

Ubuntu 18.4 @ WSL の lmdb の挙動がそもそも怪しそうなので LMDB の代わりに LevelDB を使うようにしてみる.

  • create_mnist.sh と train_lenet.sh のDBをLMDBからLevelDBに変更
diff examples.orig/mnist/create_mnist.sh examples/mnist/create_mnist.sh
10c10
< BACKEND="lmdb"
---
> BACKEND="leveldb"
diff examples.orig/mnist/lenet_train_test.prototxt examples/mnist/lenet_train_test.prototxt
14c14
<     source: "examples/mnist/mnist_train_lmdb"
---
>     source: "examples/mnist/mnist_train_leveldb"
16c16
<     backend: LMDB
---
>     backend: LEVELDB
31c31
<     source: "examples/mnist/mnist_test_lmdb"
---
>     source: "examples/mnist/mnist_test_leveldb"
33c33
<     backend: LMDB
---
>     backend: LEVELDB
  • 再実行
bash ./examples/mnist/create_mnist.sh
Creating leveldb...
I1101 22:44:16.040349  5976 db_leveldb.cpp:18] Opened leveldb examples/mnist/mnist_train_leveldb
I1101 22:44:16.042675  5976 convert_mnist_data.cpp:88] A total of 60000 items.
I1101 22:44:16.042707  5976 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I1101 22:44:21.799593  5976 convert_mnist_data.cpp:108] Processed 60000 files.
I1101 22:44:22.961354  5978 db_leveldb.cpp:18] Opened leveldb examples/mnist/mnist_test_leveldb
I1101 22:44:22.965518  5978 convert_mnist_data.cpp:88] A total of 10000 items.
I1101 22:44:22.965554  5978 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I1101 22:44:23.364717  5978 convert_mnist_data.cpp:108] Processed 10000 files.
Done.
bash ./examples/mnist/train_lenet.sh
...
I1101 22:43:35.833735  5949 solver.cpp:258]     Train net output #0: loss = 0.00705546 (* 1 = 0.00705546 loss)
I1101 22:43:35.833750  5949 sgd_solver.cpp:112] Iteration 9900, lr = 0.00596843
I1101 22:43:43.078027  5949 solver.cpp:464] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel
I1101 22:43:43.086050  5949 sgd_solver.cpp:284] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_10000.solverstate
I1101 22:43:43.131003  5949 solver.cpp:327] Iteration 10000, loss = 0.00332859
I1101 22:43:43.131049  5949 solver.cpp:347] Iteration 10000, Testing net (#0)
I1101 22:43:47.667259  5951 data_layer.cpp:73] Restarting data prefetching from start.
I1101 22:43:47.848734  5949 solver.cpp:414]     Test net output #0: accuracy = 0.9919
I1101 22:43:47.848786  5949 solver.cpp:414]     Test net output #1: loss = 0.0263434 (* 1 = 0.0263434 loss)
I1101 22:43:47.848799  5949 solver.cpp:332] Optimization Done.
I1101 22:43:47.848809  5949 caffe.cpp:250] Optimization Done.

動いた!!

CIFAR10のTutorial動作

mkdir examples
cd ./examples
cp -a ${CAFFE_ROOT}/caffe/{build,data,examples,.build_release} .
bash ./data/cifar10/get_cifar10.sh
bash ./examples/cifar10/create_cifar10.sh
bash ./examples/cifar10/train_quick.sh

ですんなり動作

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0