Help us understand the problem. What is going on with this article?

OpenCVの画像処理をmultiProcessingで並列化する

この記事について

OpenCVのGPU高速化の記事で,画像のresizeをGPUで高速化しようとしたが,速度が変わらず,multiprocessingで,並列計算を実行してみた.

実行環境

pyenv 1.2.13-2-g0aeeb6fd
python 3.6.0
OS: Ubuntu(18.04.3 LTS(Bionic Beaver))

やったこと

50万枚の画像に対して,画像のresize( 1920x1080 -> 64x64)を行なった.

Summary Version

import multiprocessing as mp
def job(何らかの引数:str):
  実行したいopencvの処理

if __name__=='__main__':
  mp.set_start_method('spawn')
  p = mp.Pool(cpu_num)
  p.map(job, リスト)

Full Version

import cv2 as cv
import os
import sys
import time
import logging
import configparser
from tqdm import tqdm
import multiprocessing as mp

### Logger
logger_db = logging.getLogger('CompressImage')
logger_db.setLevel(logging.DEBUG)
ch = logging.StreamHandler()
ch.setLevel(logging.DEBUG)
formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)8s - %(message)s', datefmt='%m/%d/%Y %I:%M %p')
ch.setFormatter(formatter)
logger_db.addHandler(ch)

def resize_job(image_file_path:str):
    image  = cv.imread(image_file_path)
    image  = cv.resize(image, dsize=(resize_size, resize_size))
    path = image_file_path.replace(base_in, base_out)
    cv.imwrite(path, image)
    return image_file_path

if __name__=='__main__':
  ### 0. Preparation
  cpu_num = mp.cpu_count()

  ### 1. Create Path List
  # start_time1 = time.time()
  # path_list = [ x[0] for x in os.walk(base_in)]
  # logger_db.debug('1. Time of creating path list: {}'.format(time.time()-start_time1))

  ### 2. Create File List
  start_time2 = time.time()
  file_list = [ os.path.join(dirpath, filename) for dirpath, _, filenames in os.walk(base_in) for filename in filenames]
  total_num = len(file_list)
  logger_db.debug('2. Time of creating file list: {}'.format(time.time() - start_time2))
  logger_db.debug('   File Number is {}'.format(total_num))

  ### 3. Create New Directory
  start_time3 = time.time()
  new_path_list  = [ x.replace(base_in, base_out) for x in path_list]
  for path in tqdm(new_path_list):
    os.makedirs(path, exist_ok=True)
  logger_db.info('3. Time of Creating Directory: {}'.format(time.time() - start_time3))

  ### 4. Resize
  start_time4 = time.time()
  mp.set_start_method('spawn')
  p = mp.Pool(cpu_num)
  # result = p.map(resize_job, tqdm(file_list))
  for _ in tqdm(p.imap_unordered(resize_job, file_list), total=total_num):
    pass
  p.close()
  logger_db.debug('4. Time of creating path list: {}'.format(time.time()-start_time4))

工夫したこと

並列処理はOpenCVのgithubのissueに丸々書かれていたので,
進歩状況がわかるように,tqdmを少し特殊な書き方で書いたところ

今後やってみたいこと

multiprocessだけでなく,multithreadで,さらに高速化したい

Shintaro0920
M2 Graph Neural Networkに興味があります.
eaglys
"EAGLYSは、未だ活用しきれていない企業に眠るデータ資産を、 安全にデータ分析・AI構築・運用するサポートを行っています。"
https://eaglys.co.jp
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした