detector = cv2.AKAZE_create()
img_umat = cv2.UMat(img)
kp1, des1 = detector.detectAndCompute(img_umat, None)
これをcv2.ocl.setUseOpenCL(True or False)で切り替えて実行した際
OpenCL onだと 700 ms
OpenCL offだと300 ms
と、2倍以上の開きがあった、なぜ??
結果: intelのオンボードgpu使ってたよ!!
スペック
-
画像サイズ 2500 x 1500くらい
-
CPU Core i7 2.7GHz 6cores (I7-8559U)
-
GPU Radeon Pro 560X 4 GB
2056 GFLOPS -
GPU Intel(R) UHD Graphics 630
106 GFLOPS
GPUの性能差で20倍あるから、当然だね;;
Radeon使ったら35msになる??
GPUを変更する方法は分かりません、情報募集
2019/06/20 追加
成功しました!
set env OPENCV_OPENCL_DEVICE=:dgpu
I successed to use radeon gpu, macbookpro and python cv2 !!
export OPENCV_OPENCL_DEVICE=:dgpu
import cv2
device = cv2.ocl.Device_getDefault()
print(f"Vendor ID: {device.vendorID()}")
print(f"Vendor name: {device.vendorName()}")
print(f"Name: {device.name()}")
Name: AMD Radeon Pro 560X Compute Engine
GPU確認方法
import cv2
device = cv2.ocl.Device_getDefault()
print(f"Vendor ID: {device.vendorID()}")
print(f"Vendor name: {device.vendorName()}")
print(f"Name: {device.name()}")
print(f"Driver version: {device.driverVersion()}")
print(f"available: {device.available()}")
print( f"Is an AMD device {device.isAMD()}")
print( f"Is a Intel device {device.isIntel()}")
print(f"Global Memory size: {device.globalMemSize()}")
print(f"Memory cache size: {device.globalMemCacheSize()}")
print(f"Memory cache type: {device.globalMemCacheType()}")
print(f"Local Memory size: {device.localMemSize()}")
print(f"Local Memory type: {device.localMemType()}")
print(f"Max Clock frequency: {device.maxClockFrequency()}")
Vendor ID: 0
Vendor name: Intel Inc.
Name: Intel(R) UHD Graphics 630
Driver version: 1.2(Mar 11 2019 21:25:40)
available: True
Is an AMD device False
Is a Intel device False
Global Memory size: 1610612736
Memory cache size: 0
Memory cache type: 0
Local Memory size: 65536
Local Memory type: 1
Max Clock frequency: 1100
参考
https://github.com/opencv/opencv/issues/13380
ついでに、cv2.ocl情報
DEVICE_EXEC_KERNEL
DEVICE_EXEC_NATIVE_KERNEL
DEVICE_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT
DEVICE_FP_DENORM
DEVICE_FP_FMA
DEVICE_FP_INF_NAN
DEVICE_FP_ROUND_TO_INF
DEVICE_FP_ROUND_TO_NEAREST
DEVICE_FP_ROUND_TO_ZERO
DEVICE_FP_SOFT_FLOAT
DEVICE_LOCAL_IS_GLOBAL
DEVICE_LOCAL_IS_LOCAL
DEVICE_NO_CACHE
DEVICE_NO_LOCAL_MEM
DEVICE_READ_ONLY_CACHE
DEVICE_READ_WRITE_CACHE
DEVICE_TYPE_ACCELERATOR
DEVICE_TYPE_ALL
DEVICE_TYPE_CPU
DEVICE_TYPE_DEFAULT
DEVICE_TYPE_DGPU
DEVICE_TYPE_GPU
DEVICE_TYPE_IGPU
DEVICE_UNKNOWN_VENDOR
DEVICE_VENDOR_AMD
DEVICE_VENDOR_INTEL
DEVICE_VENDOR_NVIDIA
Device_EXEC_KERNEL
Device_EXEC_NATIVE_KERNEL
Device_FP_CORRECTLY_ROUNDED_DIVIDE_SQRT
Device_FP_DENORM
Device_FP_FMA
Device_FP_INF_NAN
Device_FP_ROUND_TO_INF
Device_FP_ROUND_TO_NEAREST
Device_FP_ROUND_TO_ZERO
Device_FP_SOFT_FLOAT
Device_LOCAL_IS_GLOBAL
Device_LOCAL_IS_LOCAL
Device_NO_CACHE
Device_NO_LOCAL_MEM
Device_READ_ONLY_CACHE
Device_READ_WRITE_CACHE
Device_TYPE_ACCELERATOR
Device_TYPE_ALL
Device_TYPE_CPU
Device_TYPE_DEFAULT
Device_TYPE_DGPU
Device_TYPE_GPU
Device_TYPE_IGPU
Device_UNKNOWN_VENDOR
Device_VENDOR_AMD
Device_VENDOR_INTEL
Device_VENDOR_NVIDIA
Device_getDefault
KERNEL_ARG_CONSTANT
KERNEL_ARG_LOCAL
KERNEL_ARG_NO_SIZE
KERNEL_ARG_PTR_ONLY
KERNEL_ARG_READ_ONLY
KERNEL_ARG_READ_WRITE
KERNEL_ARG_WRITE_ONLY
KernelArg_CONSTANT
KernelArg_LOCAL
KernelArg_NO_SIZE
KernelArg_PTR_ONLY
KernelArg_READ_ONLY
KernelArg_READ_WRITE
KernelArg_WRITE_ONLY
OCL_VECTOR_DEFAULT
OCL_VECTOR_MAX
OCL_VECTOR_OWN
__doc__
__loader__
__name__
__package__
__spec__
finish
haveAmdBlas
haveAmdFft
haveOpenCL
setUseOpenCL
useOpenCL
cv2.ocl.Device_getDefault()
OpenCLVersion
OpenCL_C_Version
__class__
__delattr__
__dir__
__doc__
__eq__
__format__
__ge__
__getattribute__
__gt__
__hash__
__init__
__init_subclass__
__le__
__lt__
__ne__
__new__
__reduce__
__reduce_ex__
__repr__
__setattr__
__sizeof__
__str__
__subclasshook__
addressBits
available
compilerAvailable
deviceVersionMajor
deviceVersionMinor
doubleFPConfig
driverVersion
endianLittle
errorCorrectionSupport
executionCapabilities
extensions
getDefault
globalMemCacheLineSize
globalMemCacheSize
globalMemCacheType
globalMemSize
halfFPConfig
hostUnifiedMemory
image2DMaxHeight
image2DMaxWidth
image3DMaxDepth
image3DMaxHeight
image3DMaxWidth
imageFromBufferSupport
imageMaxArraySize
imageMaxBufferSize
imageSupport
intelSubgroupsSupport
isAMD
isExtensionSupported
isIntel
isNVidia
linkerAvailable
localMemSize
localMemType
maxClockFrequency
maxComputeUnits
maxConstantArgs
maxConstantBufferSize
maxMemAllocSize
maxParameterSize
maxReadImageArgs
maxSamplers
maxWorkGroupSize
maxWorkItemDims
maxWriteImageArgs
memBaseAddrAlign
name
nativeVectorWidthChar
nativeVectorWidthDouble
nativeVectorWidthFloat
nativeVectorWidthHalf
nativeVectorWidthInt
nativeVectorWidthLong
nativeVectorWidthShort
preferredVectorWidthChar
preferredVectorWidthDouble
preferredVectorWidthFloat
preferredVectorWidthHalf
preferredVectorWidthInt
preferredVectorWidthLong
preferredVectorWidthShort
printfBufferSize
profilingTimerResolution
singleFPConfig
type
vendorID
vendorName
version