More than 5 years have passed since last update.

pointnet++での実行時の問題の解決方法

Last updated at 2019-12-02Posted at 2019-12-01

pointnet++を実行したときの問題の解決方法を模索して解決したので共有します。
環境
os : ubuntu18.04
tensorflow : 1.8
python : 3.5
cuda : 9.0
cudnn : 7.0
g++,gcc : 4.8

結論

結論から言うと、gccとg++を4.8にして、pointnet2/tf_ops/samping/tf_sampling_compile.sh内に書かれているを"-D_GLIBCXX_USE_CXX11_ABI=0"をコメントアウトしてからコンパイル(bashコマンド叩く）し直したらエラーが治った。
これをpointnet2/tf_ops/3d_interpolation/tf_interpolation_compile.shとpointnet2/tf_ops/grouping/tf_grouping_compile.shにも同様の処理をする。あと中身のpathを少し変えないといけない。
とりあえずtf_sampling_compile.shの中身だけ紹介

# /bin/bash
/usr/local/cuda/bin/nvcc tf_sampling_g.cu -o tf_sampling_g.cu.o -c -O2 -DGOOGLE_CUDA=1 -x cu -Xcompiler -fPIC

# TF1.2
# g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I /usr/local/lib/python2.7/dist-packages/tensorflow/include -I /usr/local/cuda-8.0/include -lcudart -L /usr/local/cuda-8.0/lib64/ -O2 -D_GLIBCXX_USE_CXX11_ABI=0

# TF1.4
g++ -std=c++11 tf_sampling.cpp tf_sampling_g.cu.o -o tf_sampling_so.so -shared -fPIC -I /usr/local/lib/python3.5/dist-packages/tensorflow/include -I /usr/local/cuda/include -I /usr/local/lib/python3.5/dist-packages/tensorflow/include/external/nsync/public -lcudart -L /usr/local/cuda/lib64/ -L/usr/local/lib/python3.5/dist-packages/tensorflow -ltensorflow_framework -O2 #-D_GLIBCXX_USE_CXX11_ABI=0

実行時起こった問題

https://github.com/charlesq34/pointnet2
上記をclone

git clone https://github.com/charlesq34/pointnet2.git
python train.py

まあけどこれでエラーになるのは知ってたので、tf_opsディレクトリ内にある３つのディレクトリの.shを実行して.so（よくわからんけど共有ライブラリ）を作成しないといけない。
まあ、ちょろっとpath変えたりして実行。.soはできるんだけどtrain.pyを実行すると以下のようなエラー文。

中略
Traceback (most recent call last):
  File "train.py", line 52, in <module>
    MODEL = importlib.import_module(FLAGS.model) # import network module
  File "/usr/lib/python3.5/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 986, in _gcd_import
  File "<frozen importlib._bootstrap>", line 969, in _find_and_load
  File "<frozen importlib._bootstrap>", line 958, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 673, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 665, in exec_module
  File "<frozen importlib._bootstrap>", line 222, in _call_with_frames_removed
  File "/home/share/practice/pointnet2/pointnet2_own/models/pointnet2_cls_ssg.py", line 13, in <module>
    from pointnet_util import pointnet_sa_module
  File "/home/share/practice/pointnet2/pointnet2_own/utils/pointnet_util.py", line 15, in <module>
    from tf_sampling import farthest_point_sample, gather_point
  File "/home/share/practice/pointnet2/pointnet2_own/tf_ops/sampling/tf_sampling.py", line 12, in <module>
    sampling_module=tf.load_op_library(os.path.join(BASE_DIR, 'tf_sampling_so.so'))
  File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/load_library.py", line 56, in load_op_library
    lib_handle = py_tf.TF_LoadLibrary(library_filename)
tensorflow.python.framework.errors_impl.NotFoundError: /home/share/practice/pointnet2/pointnet2_own/tf_ops/sampling/tf_sampling_so.so: undefined symbol: _ZN10tensorflow8internal21CheckOpMessageBuilder9NewStringB5cxx11Ev

gitのissueにも同様の問題の人が多数いることが確認できました。
色々調べたらtensorflowのバージョン変えたり、cmakeのバージョン変えたりとか色々出てきたけど自分の場合は以下の方法で解決しました。

解決方法

１．gcc,g++を4.8にする

apt install gcc-4.8 g++-4.8 -y
ln -sf /usr/bin/gcc-4.8 /usr/bin/gcc 
ln -sf /usr/bin/g++-4.8 /usr/bin/g++
g++ -v
gcc -v

Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/4.8/lto-wrapper
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu 4.8.5-4ubuntu2' --with-bugurl=file:///usr/share/doc/gcc-4.8/README.Bugs --enable-languages=c,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr --program-suffix=-4.8 --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --with-gxx-include-dir=/usr/include/c++/4.8 --libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --enable-gnu-unique-object --disable-libmudflap --enable-plugin --with-system-zlib --disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64/jre --enable-java-home --with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-4.8-amd64 --with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-4.8-amd64 --with-arch-directory=amd64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar --enable-objc-gc --enable-multiarch --disable-werror --with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32 --with-tune=generic --enable-checking=release --build=x86_64-linux-gnu --host=x86_64-linux-gnu --target=x86_64-linux-gnu
Thread model: posix
gcc version 4.8.5 (Ubuntu 4.8.5-4ubuntu2)

これでバージョンが下がっていれば成功

２．.shファイルを書き換え

これは上記の結論に書いているので省略

３．.shファイルを実行

書き換えた.shを実行して.soを作成する

cd tf_ops/sampling
bash tf_sampling_compile.sh
cd ../grouping
bash tf_groupling_compile.sh
cd ../3d_interpolation
bash tf_interpolation_compile.sh

んでtrain.pyを実行！！！

root@caaad6906910:/home/share/practice/pointnet2/pointnet2_own# python train.py 
中略
**** EPOCH 000 ****
2019-12-01 09:32:19.674543
/home/share/practice/pointnet2/pointnet2_own/modelnet_h5_dataset.py:45: H5pyDeprecationWarning: The default file mode will change to 'r' (read-only) in h5py 3.0. To suppress this warning, pass the mode you need to h5py.File(), or set the global default h5.get_config().default_file_mode, or set the environment variable H5PY_DEFAULT_READONLY=1. Available modes are: 'r', 'r+', 'w', 'w-'/'x', 'a'. See the docs for details.
  f = h5py.File(h5_filename)
Traceback (most recent call last):
  File "train.py", line 284, in <module>
    train()
  File "train.py", line 177, in train
    train_one_epoch(sess, ops, train_writer)
  File "train.py", line 201, in train_one_epoch
    batch_data, batch_label = TRAIN_DATASET.next_batch(augment=True)
  File "/home/share/practice/pointnet2/pointnet2_own/modelnet_h5_dataset.py", line 117, in next_batch
    if augment: data_batch = self._augment_batch_data(data_batch)
  File "/home/share/practice/pointnet2/pointnet2_own/modelnet_h5_dataset.py", line 73, in _augment_batch_data
    rotated_data = provider.rotate_point_cloud(batch_data)
  File "/home/share/practice/pointnet2/pointnet2_own/utils/provider.py", line 41, in rotate_point_cloud
    for k in xrange(batch_data.shape[0]):
NameError: name 'xrange' is not defined

惜しい！！
とりあえずあのエラー文はでなくなったのでなんとかいけそうです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up