Caffe @ Ubuntu18.04 @ WSL 環境にてMNISTの実行してみる.
環境のビルドについてはこちらの記事に記述してあります.
まとめ
- Ubuntu18.04 @ WSL では Caffe の MNIST Tutorial 標準の LMDB は動作しない
- DB を LMDB から LevelDB に変更することで WSL 環境でも動作する
- 補足: 何故かcifar10のtutorialはLMDBでも動作する
MNISTのTutorialの実行でハマった手順
mkdir examples
cd ./examples
cp -a ${CAFFE_ROOT}/caffe/{build,data,examples,.build_release} .
./data/mnist/get_mnist.sh
./examples/mnist/create_mnist.sh
...
+ build/examples/mnist/convert_mnist_data.bin data/mnist/train-images-idx3-ubyte data/mnist/train-labels-idx1-ubyte examples/mnist/mnist_train_lmdb --backend=lmdb
I1031 23:51:00.930450 14783 db_lmdb.cpp:35] Opened lmdb examples/mnist/mnist_train_lmdb
I1031 23:51:00.932895 14783 convert_mnist_data.cpp:88] A total of 60000 items.
I1031 23:51:00.932911 14783 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
F1031 23:51:00.953609 14783 db_lmdb.hpp:15] Check failed: mdb_status == 0 (-30796 vs. 0) MDB_CORRUPTED: Located page was wrong type
*** Check failure stack trace: ***
@ 0x7f31b661c0cd google::LogMessage::Fail()
@ 0x7f31b661df33 google::LogMessage::SendToLog()
@ 0x7f31b661bc28 google::LogMessage::Flush()
@ 0x7f31b661e999 google::LogMessageFatal::~LogMessageFatal()
@ 0x7f31b6b4323d caffe::db::LMDBTransaction::Commit()
@ 0x7f31b7203189 convert_dataset()
@ 0x7f31b720247a main
@ 0x7f31b55f1b97 __libc_start_main
@ 0x7f31b72024ca _start
./examples/mnist/create_mnist.sh: line 18: 14783 Aborted (core dumped) $BUILD/convert_mnist_data.bin $DATA/train-images-idx3-ubyte $DATA/train-labels-idx1-ubyte $EXAMPLE/mnist_train_${BACKEND} --backend=${BACKEND}
エラーで落ちる。
CaffeのgithubによるとWSLで発生するエラーで別環境では出ないとのことだが、WSLでの回避方法は言及なし...
コードを見てるとエラーの発生元は${CAFFE}/src/caffe/util/db_lmdb.cpp内のLMDBTransaction::Commit()のMDB_CHECK(put_rc)
の模様.
どうやら wsl での mdb_put の挙動が怪しそう. ( 情報元: こことここ )
void LMDBTransaction::Commit() {
MDB_dbi mdb_dbi;
MDB_val mdb_key, mdb_data;
MDB_txn *mdb_txn;
// Initialize MDB variables
MDB_CHECK(mdb_txn_begin(mdb_env_, NULL, 0, &mdb_txn));
MDB_CHECK(mdb_dbi_open(mdb_txn, NULL, 0, &mdb_dbi));
for (int i = 0; i < keys.size(); i++) {
mdb_key.mv_size = keys[i].size();
mdb_key.mv_data = const_cast<char*>(keys[i].data());
mdb_data.mv_size = values[i].size();
mdb_data.mv_data = const_cast<char*>(values[i].data());
// Add data to the transaction
int put_rc = mdb_put(mdb_txn, mdb_dbi, &mdb_key, &mdb_data, 0);
if (put_rc == MDB_MAP_FULL) {
// Out of memory - double the map size and retry
mdb_txn_abort(mdb_txn);
mdb_dbi_close(mdb_env_, mdb_dbi);
DoubleMapSize();
Commit();
return;
}
// May have failed for some other reason
MDB_CHECK(put_rc);
}
Ubuntu 18.4 @ WSL の lmdb の挙動がそもそも怪しそうなので LMDB の代わりに LevelDB を使うようにしてみる.
- create_mnist.sh と train_lenet.sh のDBをLMDBからLevelDBに変更
diff examples.orig/mnist/create_mnist.sh examples/mnist/create_mnist.sh
10c10
< BACKEND="lmdb"
---
> BACKEND="leveldb"
diff examples.orig/mnist/lenet_train_test.prototxt examples/mnist/lenet_train_test.prototxt
14c14
< source: "examples/mnist/mnist_train_lmdb"
---
> source: "examples/mnist/mnist_train_leveldb"
16c16
< backend: LMDB
---
> backend: LEVELDB
31c31
< source: "examples/mnist/mnist_test_lmdb"
---
> source: "examples/mnist/mnist_test_leveldb"
33c33
< backend: LMDB
---
> backend: LEVELDB
- 再実行
bash ./examples/mnist/create_mnist.sh
Creating leveldb...
I1101 22:44:16.040349 5976 db_leveldb.cpp:18] Opened leveldb examples/mnist/mnist_train_leveldb
I1101 22:44:16.042675 5976 convert_mnist_data.cpp:88] A total of 60000 items.
I1101 22:44:16.042707 5976 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I1101 22:44:21.799593 5976 convert_mnist_data.cpp:108] Processed 60000 files.
I1101 22:44:22.961354 5978 db_leveldb.cpp:18] Opened leveldb examples/mnist/mnist_test_leveldb
I1101 22:44:22.965518 5978 convert_mnist_data.cpp:88] A total of 10000 items.
I1101 22:44:22.965554 5978 convert_mnist_data.cpp:89] Rows: 28 Cols: 28
I1101 22:44:23.364717 5978 convert_mnist_data.cpp:108] Processed 10000 files.
Done.
bash ./examples/mnist/train_lenet.sh
...
I1101 22:43:35.833735 5949 solver.cpp:258] Train net output #0: loss = 0.00705546 (* 1 = 0.00705546 loss)
I1101 22:43:35.833750 5949 sgd_solver.cpp:112] Iteration 9900, lr = 0.00596843
I1101 22:43:43.078027 5949 solver.cpp:464] Snapshotting to binary proto file examples/mnist/lenet_iter_10000.caffemodel
I1101 22:43:43.086050 5949 sgd_solver.cpp:284] Snapshotting solver state to binary proto file examples/mnist/lenet_iter_10000.solverstate
I1101 22:43:43.131003 5949 solver.cpp:327] Iteration 10000, loss = 0.00332859
I1101 22:43:43.131049 5949 solver.cpp:347] Iteration 10000, Testing net (#0)
I1101 22:43:47.667259 5951 data_layer.cpp:73] Restarting data prefetching from start.
I1101 22:43:47.848734 5949 solver.cpp:414] Test net output #0: accuracy = 0.9919
I1101 22:43:47.848786 5949 solver.cpp:414] Test net output #1: loss = 0.0263434 (* 1 = 0.0263434 loss)
I1101 22:43:47.848799 5949 solver.cpp:332] Optimization Done.
I1101 22:43:47.848809 5949 caffe.cpp:250] Optimization Done.
動いた!!
CIFAR10のTutorial動作
mkdir examples
cd ./examples
cp -a ${CAFFE_ROOT}/caffe/{build,data,examples,.build_release} .
bash ./data/cifar10/get_cifar10.sh
bash ./examples/cifar10/create_cifar10.sh
bash ./examples/cifar10/train_quick.sh
ですんなり動作