pointnet++を実行したときの問題の解決方法を模索して解決したので共有します。
環境
os : ubuntu18.04
tensorflow : 1.8
python : 3.5
cuda : 9.0
cudnn : 7.0
g++,gcc : 4.8
#結論
結論から言うと、gccとg++を4.8にして、pointnet2/tf_ops/samping/tf_sampling_compile.sh内に書かれているを"-D_GLIBCXX_USE_CXX11_ABI=0"をコメントアウトしてからコンパイル(bashコマンド叩く)し直したらエラーが治った。
これをpointnet2/tf_ops/3d_interpolation/tf_interpolation_compile.shとpointnet2/tf_ops/grouping/tf_grouping_compile.shにも同様の処理をする。あと中身のpathを少し変えないといけない。
とりあえずtf_sampling_compile.shの中身だけ紹介
#/bin/bash
/usr/local/cuda/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC
# TF1.2
# g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I /usr/local/lib/python2.7/dist-packages/tensorflow/include -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0
# TF1.4
g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I /usr/local/lib/python3.5/dist-packages/tensorflow/include -I /usr/local/cuda/include -I /usr/local/lib/python3.5/dist-packages/tensorflow/include/external/nsync/public -lcudart -L /usr/local/cuda/lib64/ -L/usr/local/lib/python3.5/dist-packages/tensorflow -ltensorflow_framework -O2 #-D_GLIBCXX_USE_CXX11_ABI=0
#実行時起こった問題
https://github.com/charlesq34/pointnet2
上記をclone
git clone https://github.com/charlesq34/pointnet2.git
python train.py
まあけどこれでエラーになるのは知ってたので、tf_opsディレクトリ内にある3つのディレクトリの.shを実行して.so(よくわからんけど共有ライブラリ)を作成しないといけない。
まあ、ちょろっとpath変えたりして実行。.soはできるんだけどtrain.pyを実行すると以下のようなエラー文。
中略
Traceback (most recent call last):
File "train.py", line 52, in <module>
MODEL = importlib.import_module(FLAGS.model) # import network module
File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 986, in _gcd_import
File "<frozen importlib._bootstrap>", line 969, in _find_and_load
File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 665, in exec_module
File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
File "/home/share/practice/pointnet2/pointnet2_own/models/pointnet2_cls_ssg.py", line 13, in <module>
from pointnet_util import pointnet_sa_module
File "/home/share/practice/pointnet2/pointnet2_own/utils/pointnet_util.py", line 15, in <module>
from tf_sampling import farthest_point_sample, gather_point
File "/home/share/practice/pointnet2/pointnet2_own/tf_ops/sampling/tf_sampling.py", line 12, in <module>
sampling_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_sampling_so.so'))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/share/practice/pointnet2/pointnet2_own/tf_ops/sampling/tf_sampling_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev
gitのissueにも同様の問題の人が多数いることが確認できました。
色々調べたらtensorflowのバージョン変えたり、cmakeのバージョン変えたりとか色々出てきたけど自分の場合は以下の方法で解決しました。
#解決方法
##1.gcc,g++を4.8にする
apt install gcc-4.8 g++-4.8 -y
ln -sf /usr/bin/gcc-4.8 /usr/bin/gcc
ln -sf /usr/bin/g++-4.8 /usr/bin/g++
g++ -v
gcc -v
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.5-4ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.5 (Ubuntu 4.8.5-4ubuntu2)
これでバージョンが下がっていれば成功
##2..shファイルを書き換え
これは上記の結論に書いているので省略
##3..shファイルを実行
書き換えた.shを実行して.soを作成する
cd tf_ops/sampling
bash tf_sampling_compile.sh
cd ../grouping
bash tf_groupling_compile.sh
cd ../3d_interpolation
bash tf_interpolation_compile.sh
んでtrain.pyを実行!!!
root@caaad6906910:/home/share/practice/pointnet2/pointnet2_own# python train.py
中略
**** EPOCH 000 ****
2019-12-01 09:32:19.674543
/home/share/practice/pointnet2/pointnet2_own/modelnet_h5_dataset.py:45: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
f = h5py.File(h5_filename)
Traceback (most recent call last):
File "train.py", line 284, in <module>
train()
File "train.py", line 177, in train
train_one_epoch(sess, ops, train_writer)
File "train.py", line 201, in train_one_epoch
batch_data, batch_label = TRAIN_DATASET.next_batch(augment=True)
File "/home/share/practice/pointnet2/pointnet2_own/modelnet_h5_dataset.py", line 117, in next_batch
if augment: data_batch = self._augment_batch_data(data_batch)
File "/home/share/practice/pointnet2/pointnet2_own/modelnet_h5_dataset.py", line 73, in _augment_batch_data
rotated_data = provider.rotate_point_cloud(batch_data)
File "/home/share/practice/pointnet2/pointnet2_own/utils/provider.py", line 41, in rotate_point_cloud
for k in xrange(batch_data.shape[0]):
NameError: name 'xrange' is not defined
惜しい!!
とりあえずあのエラー文はでなくなったのでなんとかいけそうです。