##この記事について
OpenCVのGPU高速化の記事で,画像のresizeをGPUで高速化しようとしたが,速度が変わらず,multiprocessingで,並列計算を実行してみた.
##実行環境
pyenv 1.2.13-2-g0aeeb6fd
python 3.6.0
OS: Ubuntu(18.04.3 LTS(Bionic Beaver))
##やったこと
50万枚の画像に対して,画像のresize( 1920x1080 -> 64x64)を行なった.
Summary Version
import multiprocessing as mp
def job(何らかの引数:str):
実行したいopencvの処理
if __name__=='__main__':
mp.set_start_method('spawn')
p = mp.Pool(cpu_num)
p.map(job, リスト)
Full Version
import cv2 as cv
import os
import sys
import time
import logging
import configparser
from tqdm import tqdm
import multiprocessing as mp
### Logger
logger_db = logging.getLogger('CompressImage')
logger_db.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)8s - %(message)s', datefmt='%m/%d/%Y %I:%M %p')
ch.setFormatter(formatter)
logger_db.addHandler(ch)
def resize_job(image_file_path:str):
image = cv.imread(image_file_path)
image = cv.resize(image, dsize=(resize_size, resize_size))
path = image_file_path.replace(base_in, base_out)
cv.imwrite(path, image)
return image_file_path
if __name__=='__main__':
### 0. Preparation
cpu_num = mp.cpu_count()
### 1. Create Path List
# start_time1 = time.time()
# path_list = [ x[0] for x in os.walk(base_in)]
# logger_db.debug('1. Time of creating path list: {}'.format(time.time()-start_time1))
### 2. Create File List
start_time2 = time.time()
file_list = [ os.path.join(dirpath, filename) for dirpath, _, filenames in os.walk(base_in) for filename in filenames]
total_num = len(file_list)
logger_db.debug('2. Time of creating file list: {}'.format(time.time() - start_time2))
logger_db.debug(' File Number is {}'.format(total_num))
### 3. Create New Directory
start_time3 = time.time()
new_path_list = [ x.replace(base_in, base_out) for x in path_list]
for path in tqdm(new_path_list):
os.makedirs(path, exist_ok=True)
logger_db.info('3. Time of Creating Directory: {}'.format(time.time() - start_time3))
### 4. Resize
start_time4 = time.time()
mp.set_start_method('spawn')
p = mp.Pool(cpu_num)
# result = p.map(resize_job, tqdm(file_list))
for _ in tqdm(p.imap_unordered(resize_job, file_list), total=total_num):
pass
p.close()
logger_db.debug('4. Time of creating path list: {}'.format(time.time()-start_time4))
工夫したこと
並列処理はOpenCVのgithubのissueに丸々書かれていたので,
進歩状況がわかるように,tqdmを少し特殊な書き方で書いたところ
今後やってみたいこと
multiprocessだけでなく,multithreadで,さらに高速化したい