OpenCVの画像処理をmultiProcessingで並列化する

Posted at 2019-09-16

この記事について

OpenCVのGPU高速化の記事で，画像のresizeをGPUで高速化しようとしたが，速度が変わらず，multiprocessingで，並列計算を実行してみた．

実行環境

pyenv 1.2.13-2-g0aeeb6fd
python 3.6.0
OS: Ubuntu(18.04.3 LTS(Bionic Beaver))

やったこと

50万枚の画像に対して，画像のresize( 1920x1080 -> 64x64)を行なった．

Summary Version

import multiprocessing as mp
def job(何らかの引数:str):
  実行したいopencvの処理

if __name__=='__main__':
  mp.set_start_method('spawn')
  p = mp.Pool(cpu_num)
  p.map(job, リスト)

Full Version

import cv2 as cv
import os
import sys
import time
import logging
import configparser
from tqdm import tqdm
import multiprocessing as mp

### Logger
logger_db = logging.getLogger('CompressImage')
logger_db.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)8s - %(message)s', datefmt='%m/%d/%Y %I:%M %p')
ch.setFormatter(formatter)
logger_db.addHandler(ch)

def resize_job(image_file_path:str):
    image  = cv.imread(image_file_path)
    image  = cv.resize(image, dsize=(resize_size, resize_size))
    path = image_file_path.replace(base_in, base_out)
    cv.imwrite(path, image)
    return image_file_path

if __name__=='__main__':
  ### 0. Preparation
  cpu_num = mp.cpu_count()

  ### 1. Create Path List
  # start_time1 = time.time()
  # path_list = [ x[0] for x in os.walk(base_in)]
  # logger_db.debug('1. Time of creating path list: {}'.format(time.time()-start_time1))

  ### 2. Create File List
  start_time2 = time.time()
  file_list = [ os.path.join(dirpath, filename) for dirpath, _, filenames in os.walk(base_in) for filename in filenames]
  total_num = len(file_list)
  logger_db.debug('2. Time of creating file list: {}'.format(time.time() - start_time2))
  logger_db.debug('   File Number is {}'.format(total_num))

  ### 3. Create New Directory
  start_time3 = time.time()
  new_path_list  = [ x.replace(base_in, base_out) for x in path_list]
  for path in tqdm(new_path_list):
    os.makedirs(path, exist_ok=True)
  logger_db.info('3. Time of Creating Directory: {}'.format(time.time() - start_time3))

  ### 4. Resize
  start_time4 = time.time()
  mp.set_start_method('spawn')
  p = mp.Pool(cpu_num)
  # result = p.map(resize_job, tqdm(file_list))
  for _ in tqdm(p.imap_unordered(resize_job, file_list), total=total_num):
    pass
  p.close()
  logger_db.debug('4. Time of creating path list: {}'.format(time.time()-start_time4))

工夫したこと

並列処理はOpenCVのgithubのissueに丸々書かれていたので，
進歩状況がわかるように，tqdmを少し特殊な書き方で書いたところ

今後やってみたいこと

multiprocessだけでなく，multithreadで，さらに高速化したい

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up