More than 3 years have passed since last update.

Python 3.8 + CUDA10.2でTheano実装のLIFTを動かす

Posted at 2021-04-03

はじめに

2021になってLIFT(Learned Invariant Feature Transform)¹をRTX2080Tiを積んだWindows10（とServer 2019）上のPython 3.8.5で動かしてみようと思った所，既に開発終了なTheanoで実装されていたため(?)に色々と躓いたので，動作までのメモ．

色んなCUDAバージョンで試したが，最終的にCUDA10.2 + cuDNN7.6.5は動作を確認．

なので，NVIDIA公式から落として入れたのは

Base Installer: cuda_10.2.89_441.22_win10.exe
Patch 2 (Released Nov 17, 2020): cuda_10.2.2_win10.exe
cuDNN v7.6.5 (November 18th, 2019), for CUDA 10.2: cudnn-10.2-windows10-x64-v7.6.5.32.zip

の3ファイル．

また，SIFT用ラッパのビルドに必要な

OpenCV – 4.5.1: opencv-4.5.1-vc14_vc15.exe
Visual Studio Community 2019: vs_community__501120973.1616119645.exe

は元々入れてあった．

LIFT

公開されているコードリポジトリはコチラ↓

論文はココなど↓

Theano環境の構築

色々躓いた中でのメモを形にしただけなので，必要な情報だけ読み取って下さいm(_ _)m
なお，検索しても古い別環境を用意するか諦めるかのような内容や，手元の環境では（それのみでは）うまくいかない内容しか見当たらなかったです．
また，どれだけ探しても見付からない情報も多々ありました…

情報例

https://github.com/Theano/Theano/issues/5856 https://github.com/Theano/Theano/issues/6063 https://github.com/Theano/Theano/issues/6556 https://github.com/Theano/Theano/issues/6681 https://github.com/Theano/libgpuarray/issues/402 https://github.com/Theano/libgpuarray/issues/445 https://github.com/Theano/libgpuarray/issues/562 https://github.com/Theano/libgpuarray/issues/587 https://github.com/sevensgd/Install-Theano-with-GPU-support-on-Windows-10 https://groups.google.com/g/theano-users/c/6hWfVPMEzHs https://groups.google.com/g/theano-users/c/GQJ8yojgJo4 https://theano-users.narkive.com/xC8ZIw3F/can-not-import-theano-gpuarrayexception-could-not-load-nvrtc64-70-dll https://github.com/aigamedev/scikit-neuralnetwork/issues/235 http://barus.hatenadiary.jp/entry/2017/11/05/171730 https://teratail.com/questions/56469 https://stackoverflow.com/questions/48838386/g-not-available-if-using-conda-conda-install-m2w64-toolchain https://stackoverflow.com/questions/46036919/pygpu-missing-blas-library-and-other-test-errors https://stackoverflow.com/questions/39297995/getting-pygpu-was-configured-but-could-not-be-imported-error-while-trying-with https://stackoverflow.com/questions/38215120/dll-load-failed-the-specified-module-could-not-be-found-for-pygpu-libgpuarray https://www.programmersought.com/article/32836851797/ https://www.programmersought.com/article/21046187280/ https://www.programmersought.com/article/5003679709/ https://machinelearningmastery.com/introduction-python-deep-learning-library-theano/ https://www.codetd.com/en/article/7405218 https://docs.nvidia.com/deeplearning/frameworks/theano-release-notes/rel_17.10.html

http://deeplearning.net/はもう見れない．
（見たこともないので，どんな情報があったのかもわからないですが．関係ないけど，高く売れそうなドメインですね…）
公式のディスコンアナウンスはコチラ
https://groups.google.com/g/theano-users/c/7Poq8BZutbY

requirements通りのライブラリインストール

まず，リポジトリのrequirementsにある通りにライブラリを入れてみる．
なお，動かしたい環境がPyTorchなどを動かしている既存の環境だったので，

このような仮想環境

>conda list
# packages in environment at xxx\[ENV NAME]:
#
# Name                    Version                   Build  Channel
blas                      1.0                         mkl
ca-certificates           2021.1.19            haa95532_0
certifi                   2020.12.5        py38haa95532_0
cudatoolkit               11.0.221             h74a9793_0
cycler                    0.10.0                     py_2    conda-forge
freeglut                  3.2.1                h0e60522_0    conda-forge
freetype                  2.10.4               h546665d_0    conda-forge
icc_rt                    2019.0.0             h0cc432a_1
icu                       67.1                 h33f27b4_0    conda-forge
intel-openmp              2020.3             h57928b3_311    conda-forge
jasper                    2.0.14               hdc05fd1_1    conda-forge
joblib                    1.0.0              pyhd3eb1b0_0
jpeg                      9d                   h8ffe710_0    conda-forge
kiwisolver                1.3.1            py38hbd9d945_0    conda-forge
libblas                   3.8.0                    21_mkl    conda-forge
libcblas                  3.8.0                    21_mkl    conda-forge
libclang                  10.0.1          default_hf44288c_1    conda-forge
liblapack                 3.8.0                    21_mkl    conda-forge
liblapacke                3.8.0                    21_mkl    conda-forge
libopencv                 4.5.0                    py38_4    conda-forge
libpng                    1.6.37               h1d00b33_2    conda-forge
libtiff                   4.2.0                hc10be44_0    conda-forge
libuv                     1.40.0               he774522_0
libwebp-base              1.1.0                h8ffe710_3    conda-forge
lz4-c                     1.9.2                h62dcd97_2    conda-forge
matplotlib                3.3.3            py38haa244fe_0    conda-forge
matplotlib-base           3.3.3            py38h34ddff4_0    conda-forge
mkl                       2020.4             hb70f87d_311    conda-forge
mkl-service               2.3.0            py38h196d8e1_0
ninja                     1.10.2           py38h6d14046_0
numpy                     1.19.4           py38h0cc643e_1    conda-forge
olefile                   0.46                       py_0
openssl                   1.1.1i               h2bbff1b_0
pandas                    1.2.0            py38hf11a4ad_0
pillow                    8.0.1            py38h4fa10fc_0
pip                       20.3.3           py38haa95532_0
py-opencv                 4.5.0            py38hc5df569_4    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py38haa244fe_6    conda-forge
pyqt-impl                 5.12.3           py38h885f38d_6    conda-forge
pyqt5-sip                 4.19.18          py38h885f38d_6    conda-forge
pyqtchart                 5.12             py38h885f38d_6    conda-forge
pyqtwebengine             5.12.1           py38h885f38d_6    conda-forge
python                    3.8.5                h5fd99cc_1
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
pytorch                   1.7.1           py3.8_cuda110_cudnn8_0    pytorch
pytz                      2020.5             pyhd3eb1b0_0
qt                        5.12.9               hb2cf2c5_0    conda-forge
scikit-learn              0.23.2           py38h47e9c7a_0
scipy                     1.5.2            py38h14eb087_0
setuptools                51.0.0           py38haa95532_2
six                       1.15.0           py38haa95532_0
sqlite                    3.33.0               h2a8f88b_0
threadpoolctl             2.1.0              pyh5ca1d4c_0
tk                        8.6.10               he774522_0
torchaudio                0.7.2                      py38    pytorch
torchvision               0.2.2                      py_3    pytorch
tornado                   6.1              py38h294d835_0    conda-forge
tqdm                      4.55.1             pyhd3eb1b0_0
typing_extensions         3.7.4.3                    py_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.36.2             pyhd3eb1b0_0
wincertstore              0.2                      py38_0
xz                        5.2.5                h62dcd97_1    conda-forge
zlib                      1.2.11               h62dcd97_4
zstd                      1.4.5                h1f3a1b7_2    conda-forge

に追加で入れていきます．イチから構築ではなく，だいたい

python 3.8.5
pytorch 1.7.1
py-opencv 4.5.0

などが元々入っている所がスタート地点．

requirements.txt

flufl.lock
https://github.com/Theano/Theano/archive/master.zip
https://github.com/Lasagne/Lasagne/archive/master.zip
numpy
parse
h5py
scipy

を個別に入れていった．
実行するコマンドは

conda install -c conda-forge flufl.lock
pip install https://github.com/Lasagne/Lasagne/archive/master.zip
pip install https://github.com/Theano/Theano/archive/master.zip
conda install numpy
conda install -c conda-forge parse
conda install h5py
conda install scipy

（ただし，numpyとscipyは元々入っていたので手元では未実行）
これで

こんな仮想環境

>conda list
# packages in environment at xxx\[ENV NAME]:
#
# Name                    Version                   Build  Channel
atpublic                  1.0                        py_0    conda-forge
blas                      1.0                         mkl
ca-certificates           2021.1.19            haa95532_1
certifi                   2020.12.5        py38haa95532_0
cudatoolkit               11.0.221             h74a9793_0
cycler                    0.10.0                     py_2    conda-forge
flufl.lock                3.2                        py_0    conda-forge
freeglut                  3.2.1                h0e60522_0    conda-forge
freetype                  2.10.4               h546665d_0    conda-forge
h5py                      2.10.0           py38h5e291fa_0
hdf5                      1.10.4               h7ebc959_0
icc_rt                    2019.0.0             h0cc432a_1
icu                       67.1                 h33f27b4_0    conda-forge
intel-openmp              2020.3             h57928b3_311    conda-forge
jasper                    2.0.14               hdc05fd1_1    conda-forge
joblib                    1.0.0              pyhd3eb1b0_0
jpeg                      9d                   h8ffe710_0    conda-forge
kiwisolver                1.3.1            py38hbd9d945_0    conda-forge
lasagne                   0.2.dev1                 pypi_0    pypi
libblas                   3.8.0                    21_mkl    conda-forge
libcblas                  3.8.0                    21_mkl    conda-forge
libclang                  10.0.1          default_hf44288c_1    conda-forge
liblapack                 3.8.0                    21_mkl    conda-forge
liblapacke                3.8.0                    21_mkl    conda-forge
libopencv                 4.5.0                    py38_4    conda-forge
libpng                    1.6.37               h1d00b33_2    conda-forge
libtiff                   4.2.0                hc10be44_0    conda-forge
libuv                     1.40.0               he774522_0
libwebp-base              1.1.0                h8ffe710_3    conda-forge
lz4-c                     1.9.2                h62dcd97_2    conda-forge
matplotlib                3.3.3            py38haa244fe_0    conda-forge
matplotlib-base           3.3.3            py38h34ddff4_0    conda-forge
mkl                       2020.4             hb70f87d_311    conda-forge
mkl-service               2.3.0            py38h196d8e1_0
ninja                     1.10.2           py38h6d14046_0
numpy                     1.19.4           py38h0cc643e_1    conda-forge
olefile                   0.46                       py_0
openssl                   1.1.1j               h2bbff1b_0
pandas                    1.2.0            py38hf11a4ad_0
parse                     1.19.0             pyh44b312d_0    conda-forge
pillow                    8.0.1            py38h4fa10fc_0
pip                       20.3.3           py38haa95532_0
py-opencv                 4.5.0            py38hc5df569_4    conda-forge
pyparsing                 2.4.7              pyh9f0ad1d_0    conda-forge
pyqt                      5.12.3           py38haa244fe_6    conda-forge
pyqt-impl                 5.12.3           py38h885f38d_6    conda-forge
pyqt5-sip                 4.19.18          py38h885f38d_6    conda-forge
pyqtchart                 5.12             py38h885f38d_6    conda-forge
pyqtwebengine             5.12.1           py38h885f38d_6    conda-forge
pyreadline                2.1                      py38_1
python                    3.8.5                h5fd99cc_1
python-dateutil           2.8.1                      py_0    conda-forge
python_abi                3.8                      1_cp38    conda-forge
pytorch                   1.7.1           py3.8_cuda110_cudnn8_0    pytorch
pytz                      2020.5             pyhd3eb1b0_0
qt                        5.12.9               hb2cf2c5_0    conda-forge
scikit-learn              0.23.2           py38h47e9c7a_0
scipy                     1.5.2            py38h14eb087_0
setuptools                51.0.0           py38haa95532_2
six                       1.15.0           py38haa95532_0
sqlite                    3.33.0               h2a8f88b_0
theano                    1.0.5+unknown            pypi_0    pypi
threadpoolctl             2.1.0              pyh5ca1d4c_0
tk                        8.6.10               he774522_0
torchaudio                0.7.2                      py38    pytorch
torchvision               0.2.2                      py_3    pytorch
tornado                   6.1              py38h294d835_0    conda-forge
tqdm                      4.55.1             pyhd3eb1b0_0
typing_extensions         3.7.4.3                    py_0
vc                        14.2                 h21ff451_1
vs2015_runtime            14.27.29016          h5e58377_2
wheel                     0.36.2             pyhd3eb1b0_0
wincertstore              0.2                      py38_0
xz                        5.2.5                h62dcd97_1    conda-forge
zlib                      1.2.11               h62dcd97_4
zstd                      1.4.5                h1f3a1b7_2    conda-forge

になった．

gpuarray.dllのbuild

requirementsを満たしたこの状態で試しに

import theano

してみると，

>>> import theano
WARNING (theano.configdefaults): g++ not available, if using conda: `conda install m2w64-toolchain`
xxx\[ENV NAME]\lib\site-packages\theano\configdefaults.py:560: UserWarning: DeprecationWarning: there is no c++ compiler.This is deprecated and with Theano 0.11 a c++ compiler will be mandatory
  warnings.warn("DeprecationWarning: there is no c++ compiler."
WARNING (theano.configdefaults): g++ not detected ! Theano will be unable to execute optimized C-implementations (for both CPU and GPU) and will default to Python implementations. Performance will be severely degraded. To remove this warning, set Theano flags cxx to an empty string.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Traceback (most recent call last):
  File "yyy.py", line 717, in <module>
    from aaa import bbb
  File "zzz.py", line 16, in <module>
    import h5py
  File "xxx\[ENV NAME]\lib\site-packages\h5py\__init__.py", line 26, in <module>
    from . import _errors
ImportError: DLL load failed while importing _errors: 指定されたモジュールが見つかりません。

となり，動かない．
色々調べてみるとgpuarray.dllが駄目らしいので，自前で準備する．

プロジェクト作成

適当な作業用ディレクトリにて

git clone https://github.com/Theano/libgpuarray.git

もしくは
https://github.com/Theano/libgpuarray
からダウンロードし，

cd libgpuarray
mkdir build
cd build
cmake .. -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 16"

元々入ってたのがVisual Studio 2019だったので"Visual Studio 16"ですが，

環境に応じて

Visual Studio Community 2019: cmake .. -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 16"
Visual Studio Community 2017 (version 15.9): cmake .. -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 15 Win64"
Visual Studio Community 2015: cmake .. -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 14 Win64"

要変更．（VS2019からはWin64が要らない模様）

併存するCUDAバージョンへの対応

既存コードから出来上がるgpuarray.dllは，そのままだと強制的に環境内の最新のcudaを参照してしまう．
（ビルド時ではなく，実行時に最新のものを参照しようとする．）
CUDA 11系では動いてくれないので，冒頭の通り10系を参照してもらわないと困る．

検索しても全く情報が出てこなかった（新しいのを消そう，はあった）が，コードを漁った所，loaders\libcuda.c内のcuDriverGetVersionで環境パス内のnvcuda.dllを探してきて必要ライブラリをそのバージョンに合わせてしまう模様．
コレはC:\Windows\System32に居るので，読みに行かなくするのは大変．
そのため，gpuarray_buffer_cuda.c内の176行目

gpuarray_buffer_cuda.c

/* Let's try to load a nvrtc corresponding to detected CUDA version. */

の上か下辺りで

gpuarray_buffer_cuda.c

    major = 10;
    minor = 2;

のように強制上書きしてやる．
（他ファイル含め色んな所で上書きしてもいいが，ここさえ変えれば引き回されるはず．）
当然，これ以前のCUDAで動かすなら適宜その数値へ．
nvcuda.dllを探さないようにし，別手段で自動設定させるなどの対応方法も可能だが，対象が決まっているなら固定した方が楽．

ビルドや実行時に必要なライブラリのインストールおよびbuild

Pythonが絡むので，先の環境をActivateして仮想環境内で以下のようにビルドする．
このときまた色々要求されるため，先の環境にpygpuを入れておく．
また，先のWARNING内で言われているm2w64-toolchainやlibpythonも入れておく．

conda install -c conda-forge pygpu
conda install m2w64-toolchain
conda install libpython

この環境をActivateして仮想環境内で以下のようにビルドする．

MSBuild libgpuarray.sln /t:Rebuild /p:Configuration=Release

gpuarray.dllの置換

出来上がった

lib\Release\gpuarray.dll

を仮想環境内の既存のライブラリ

xxx\[ENV NAME]\Library\bin\gpuarray.dll

と置き換える．
（念の為，元のdllはgpuarray.dll.bakなどへリネームして残しておく．）

これでとりあえず

import theano

で怒られなくはなる．

CUDA用dllの置換

Theanoのサンプルコードとしてコチラ
http://may46onez.hatenablog.com/entry/2015/11/21/150621
の動作確認コードを利用して動くか試してみると，

pygpu.gpuarray.GpuArrayException: b'Could not load "nvrtc64_102.dll": \x8ew\x92\xe8\x82\xb3\x82\xea\x82\xbd\x83\x82\x83W\x83\x85\x81[\x83\x8b\x82\xaa\x8c\xa9\x82\xc2\x82\xa9\x82\xe8\x82\xdc\x82\xb9\x82\xf1\x81B\r\n'

と怒られるため，

xxx\[ENV NAME]\Library\bin\

へ

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin\nvrtc64_102_0.dll

をnvrtc64_102.dllとしてコピー．（_0を消す）

再度試すと今度は

pygpu.gpuarray.GpuArrayException: (b'Missing Blas library', 5)

と怒られるため，同様にcublas64_10.dllをcublas64_102.dllとしてコピー．（マイナーバージョンの2を足す）

これでDLL系はOKなので，ホームディレクトリに.theanorcを用意して実行すると正常に終わる．

.theanorcもしくは.theanorc.txtの例

.theanorc

# !sh
[global]
device = cuda
floatX = float32

[nvcc]
flags = -LC:\Users\[USER NAME]\Anaconda3\libs --machine=64
compiler_bindir = C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\Hostx64\x64
fastmath = True

[dnn]
enabled = True
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
base_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
bin_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin

[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2

GPUを指定したはずでもUsed the cpuと出るが，

Looping 1000 times took 0.29122257232666016 seconds
Looping 1000 times took 9.404838800430298 seconds

前者がGPU指定，後者がCPU指定時の動作結果なので，これでTheano環境はおそらくOK．

LIFTを動作させる

Theanoが動くようになっても，そのままでは（Windowsなこともあり）動かず．
ココでも色々面倒でした…

Wrapper for SIFT

SIFT用のラッパを要するので，LIFTコード群のc-codeをビルドする．
（以下，環境変数のOPENCV_DIRにopencv\buildが通っている前提）

cd [LIFT PATH]\c-code\build
cmake .. -DCMAKE_BUILD_TYPE=Release -G "Visual Studio 16"
MSBuild SIFT.sln /t:Rebuild /p:Configuration=Release

とすると，

[LIFT PATH]\c-code\sift.cpp(1008,3): error C2065: 'CV_StsBadArg': 定義されていな
い識別子です。 [[LIFT PATH]\c-code\build\SIFT.vcxproj]

と怒られる．

https://docs.opencv.org/3.4/d2/df8/group__core__c.html
を見ると-5らしいので，sift.hppの該当箇所を適当に

sift.hpp

    double *pfOutAngle = (double *)out_angle;
    auto StsBadArg = -5;

    if (image.empty() || image.depth() != CV_8U)
        CV_Error(StsBadArg, "image is empty or has incorrect depth (!=CV_8U)");
//        CV_Error(CV_StsBadArg, "image is empty or has incorrect depth (!=CV_8U)");

などと指定してしまう．
あと，関数を呼び出せないといけないので，sift.hppの頭辺りに

sift.hpp

# ifdef _WINDLL
# define SIFT_API extern "C" __declspec(dllexport)
# else
# define SIFT_API extern "C"
# endif

を追記．

これでMSBuildし直すとwrapperが出来上がる．

DLLパスの指定（Winのみ）

\Utils\sift_tools.pyを

sift_tools.py

import platform
ps = platform.system()
if ps == "Darwin":
    libSIFT = cdll.LoadLibrary("../c-code/libSIFT.dylib")
elif ps == "Windows":
    libSIFT = cdll.LoadLibrary('../c-code/Release/SIFT.dll')
else:
    libSIFT = cdll.LoadLibrary('../c-code/libSIFT.so')

（ビルド箇所によるので，DLLの相対パスは適宜変更）
これで動く．

その他の修正点

戻して試していないので不要な修正もあるかも知れないが，自分の環境用に以下のようにいくつか書き換えた．

\Utils\custom_types.py
def setupTrainの頭を

custom_types.py

    def setupTrain(self, param, setID):

        import platform
        ps = platform.system()
        is_Windows = ps == "Windows"

        if not os.path.exists(".locks"):
            os.makedirs(".locks")

同self.volatile_tempの設定箇所を

custom_types.py

        if self.volatile_temp == '':
            if is_Windows:
                self.volatile_temp = "./scratch/" + "fararrow" + "/Temp"
            else:
                self.volatile_temp = "/scratch/" + os.getenv('USER') + "/Temp"

固定したけど，'USERNAME'なら取れると思う．
同様にテストの方も，\Utils\custom_types.pyのdef setupTestの頭を

custom_types.py

    def setupTest(self, param, testDataName):

        import platform
        ps = platform.system()
        is_Windows = ps == "Windows"

こちらのself.volatile_tempの設定箇所も先と同様．

\Utils\solvers.py
time.clock()はPython3.3で無くなっているので，

solvers.py

def TestImage(pathconf, param, image, verbose=True, network_weights=None):

#    start_time = time.clock()
    start_time = time.process_time()
    # ------------------------------------------------------------------------
    # Create the Network
    myNet = CreateNetwork4Image(pathconf, param, image, verbose=verbose)
#    end_time = time.clock()
    end_time = time.process_time()
    compile_time = (end_time - start_time) * 1000.0
.
.
.
    # ------------------------------------------------------------------------
    # Create the Network
#    start_time = time.clock()
    start_time = time.process_time()
    myNet = CreateNetwork(pathconf, param, test_data_in, test_data_in,
                          test_data_in)
#    end_time = time.clock()
    end_time = time.process_time()
    compile_time = (end_time - start_time) * 1000.0

OpenCVのバージョン

本記事のようにOpenCV4.x環境でテストコードのcompute_descriptor.pyを動かす場合は，
内部関数のdraw_XYZS_to_imgに

compute_descriptor.py

if cv2.__version__[0] == '3':

という箇所があるので，4を足して

compute_descriptor.py

if cv2.__version__[0] == '3' or cv2.__version__[0] == '4':

にするとか，数値にして3以上にするかが必要．

余談

その他のメモです．
ちなみに本記事は，PyTorch，TensorFlow+Keras，Chainer, Caffe（←ちょっとだけ）は使っている（た）ものの，Theanoは触ったこともなかったため，他の人は躓かないような内容だった可能性もありますm(_ _)m

試した環境

ここまでは冒頭の通り，Conda環境を除くとCUDA10.2 + cuDNN7.6.5 + OpenCV4.5.1で構築した．
ただ，最初はCUDA10.1 + CUDNN7.6.0 + OpenCV4.2.0で試していたので，そちらの組み合わせも動作確認済み．
（OpenCVのインストーラはopencv-4.2.0-vc14_vc15.exe）

BLAS

当初は見付からないと怒られるDLLが何を意味するのかわからず，
https://stackoverflow.com/questions/45722188/tutorial-for-installing-numpy-with-openblas-on-windows
などを見ながらOpenBLASをビルドしたりなんやかんやしたけど，結果的には不要でした．

Theano設定ファイル

ちなみに動かない原因の特定中にVSやCUDAの色んなバージョンでの組み合わせを試したりしていたので，.theanorcはこんな感じで設定を切り替えながら実行していました．

.theanorcもしくは.theanorc.txtの迷走例

.theanorc

# !sh
[global]
# device = gpu
# device = cuda
device = cuda2
floatX = float32

mode = FAST_RUN
exception_verbosity=high

[lib]
# cnmem = 1
cnmem = 0.8

[nvcc]
# flags = C:\Users\[USER NAME]\Anaconda3\libs
flags = -LC:\Users\[USER NAME]\Anaconda3\libs --machine=64
# compiler_bindir = C:\Program Files (x86)\Microsoft Visual Studio 14.0\VC\bin
# compiler_bindir = C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\bin\Hostx64\x64
compiler_bindir = C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.28.29910\bin\Hostx64\x64
# flags=-m32 # we have this hard coded for now
fastmath = True

[dnn9]
enabled = True
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\include
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\lib\x64
base_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2
bin_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2\bin

[dnn101]
enabled = True
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\include
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\lib\x64
base_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1
bin_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1\bin

[dnn]
enabled = True
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\include
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\lib\x64
base_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2
bin_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2\bin

[dnn110]
enabled = True
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\include
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\lib\x64
base_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0
bin_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0\bin

[dnn112]
enabled = True
include_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\include
library_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\lib\x64
base_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2
bin_path = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2\bin

[cuda92]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.2

[cuda101]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.1

[cuda]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.2

[cuda110]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.0

[cuda112]
root = C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.2

[blas]
# ldflags =
# ldflags = -LC:\Lib\OpenBLAS_0.3.13\xianyi-OpenBLAS-d2b11c4 -lopenblas
# ldflags = -LC:\Lib\OpenBLAS -lopenblas
# ldflags = -lopenblas # placeholder for openblas support

[gpuarray]
# preallocate = 1

Win上での古い設定例は https://www.codeproject.com/Articles/1158306/Theano-Machine-Learning-on-a-GPU-on-Windows もう少し新しいのは https://gist.github.com/sachin-kmr/1e70262255dfa378288d19c9657950f9 など．

LIFT動作結果

このように非常に苦労してどうにか動きそうになったので，compute_detector.py, compute_orientation.py, compute_descriptor.pyを繋げて特徴抽出してみました．
が…（↑の挙動不審なRTX3080でなく，動作させられたRTX2080Tiであっても）何度も行われるコンパイル部分のせいか非常に遅く，遅さに定評のある（当たり前なのですが）ASIFT(Affine-SIFT)のさらに70倍の計算時間がかかりました…
（よく情報が出てくるCUDA9.0やより前のcuDNNだと変わるのかも知れないし，各DLLのビルドオプションで変わるかも知れない．あと細かくは調べていないので，.theanorcをちゃんと設定したら変わるかも知れない．）
その上，低精度でした．
ハンドメイド系局所特徴や，より最近の学習系局所特徴と比べてみたかっただけなので，どんな結果でも構わないのですけどね(^^;

M. Kwang, E. Trulls, V. Lepetit, and P. Fua, "LIFT: Learned Invariant Feature Transform," ECCV 2016. ↩

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

Python 3.8 + CUDA10.2でTheano実装のLIFTを動かす

はじめに

LIFT

Theano環境の構築

requirements通りのライブラリインストール

gpuarray.dllのbuild

プロジェクト作成

併存するCUDAバージョンへの対応

ビルドや実行時に必要なライブラリのインストールおよびbuild

gpuarray.dllの置換

CUDA用dllの置換

LIFTを動作させる

Wrapper for SIFT

DLLパスの指定（Winのみ）

その他の修正点

OpenCVのバージョン

余談

試した環境

BLAS

Theano設定ファイル

最新環境

LIFT動作結果