More than 3 years have passed since last update.

ディレクトリごとにある画像らをPDF化させるCLIツールを作った

Posted at 2020-10-28

Github

https://github.com/ikota3/image_utilities

概要

自炊を快適に行うために，ディレクトリごとに格納されている画像らをそれぞれPDF化させるツールを作りました．
また，サブディレクトリも再帰的に見て作成を行います．

動作確認はWindows10で行いました．

必要なもの

Library
- fire
- img2pdf

使い方

$ git clone https://github.com/ikota3/image_utilities
$ cd image_utilities
$ pip install -r requirements.txt
$ python src/images_to_pdf.py convert -i "path/to/input" -o "path/to/output" -e "jpg,jpeg,png"

実装内容

今回，CLIツールを作るために全て一から作るのは大変そうだと思い，前に話題になっていた fire というライブラリを使いました．

実際，すごく使いやすかったです．

コマンドを叩けるようにする

まずはコマンドを叩いて入力値を受け取れるよう，骨組みを作成します．
今回はクラスを作って，fireで呼び出すようにします．
fireはクラスのほかにも関数，モジュール，オブジェクト，他様々なものを呼び出すことが可能です．
詳しくは公式のドキュメントを参照ください．
https://github.com/google/python-fire/blob/master/docs/guide.md

images_to_pdf.py

import fire

class PDFConverter(object):
    """Class for convert images to pdf."""

    def __init__(
            self,
            input_dir: str = "",
            output_dir: str = "",
            extensions: Union[str, Tuple[str]] = None,
            force_write: bool = False,
            yes: bool = False
    ):
        """Initialize

        Args:
            input_dir (str): Input directory. Defaults to "".
            output_dir (str): Output directory. Defaults to "".
            extensions (Union[str, Tuple[str]]): Extensions. Defaults to None.
            force_write (bool): Flag for overwrite the converted pdf. Defaults to False.
            yes (bool): Flag for asking to execute or not. Defaults to False.
        """
        self.input_dir: str = input_dir
        self.output_dir: str = output_dir
        if not extensions:
            extensions = ('jpg', 'png')
        self.extensions: Tuple[str] = extensions
        self.force_write: bool = force_write
        self.yes: bool = yes

	def convert(self):
		print("Hello World!")


if __name__ == '__main__':
    fire.Fire(PDFConverter)

ひとまずこれくらいで骨組みは完成です．
この状態でコマンドを打つと，Hello World!と出力されるはずです．

$ python src/images_to_pdf.py convert
Hello World!

また，input_dir = "" などの他のパラメータは，デフォルト値を設定していますが，これを設定せずにコマンド側で値を渡さなかったとき，fire側のエラーが発生します．

値の渡し方は，__init__で設定した引数の接頭辞の前にハイフンをつけたあとに，渡したい値を書くだけです．

下記コマンドの渡し方は，書き方に違いはあれど結果は変わらないです．

$ # self.input_dirの例
$ python src/images_to_pdf.py convert -i "path/to/input"
$ python src/images_to_pdf.py convert -i="path/to/input"
$ python src/images_to_pdf.py convert --input_dir "path/to/input"
$ python src/images_to_pdf.py convert --input_dir="path/to/input"

また，つまづいた点としてリストを渡そうとしたときに戸惑いました．

$ # self.extensionsの例
$ python src/images_to_pdf.py convert -e jpg,png # OK
$ python src/images_to_pdf.py convert -e "jpg,png" # OK
$ python src/images_to_pdf.py convert -e "jpg, png" # OK
$ python src/images_to_pdf.py convert -e jpg, png # NG

入力値チェック

PDF化処理を行う前に，isinstance()による型チェックと，指定したパスが存在するものか，等のチェックを行っています．

images_to_pdf.py

def _input_is_valid(self) -> bool:
    """Validator for input.

    Returns:
        bool: True if is valid, False otherwise.
    """
    is_valid = True

    # Check input_dir
    if not isinstance(self.input_dir, str) or \
            not os.path.isdir(self.input_dir):
        print('[ERROR] You must type a valid directory for input directory.')
        is_valid = False

    # Check output_dir
    if not isinstance(self.output_dir, str) or \
            not os.path.isdir(self.output_dir):
        print('[ERROR] You must type a valid directory for output directory.')
        is_valid = False

    # Check extensions
    if not isinstance(self.extensions, tuple) and \
            not isinstance(self.extensions, str):
        print('[ERROR] You must type at least one extension.')
        is_valid = False

    # Check force_write
    if not isinstance(self.force_write, bool):
        print('[ERROR] You must just type -f flag. No need to type a parameter.')
        is_valid = False

    # Check yes
    if not isinstance(self.yes, bool):
        print('[ERROR] You must just type -y flag. No need to type a parameter.')
        is_valid = False

    return is_valid

ディレクトリを走査し，画像を集め，PDF化

受け取った input_dir のパスからディレクトリを走査するために，os.walk() というものを使っています．
https://docs.python.org/ja/3/library/os.html?highlight=os%20walk#os.walk

以下のようにディレクトリを走査し，画像を集めていき，PDF化を行っています．

images_to_pdf.py

def convert(self):
    # 拡張子の接頭辞に . を加える
    extensions: Union[str | Tuple[str]] = None
    if isinstance(self.extensions, tuple):
        extensions = []
        for extension in self.extensions:
            extensions.append(f'.{extension}')
        extensions = tuple(extensions)
    elif isinstance(self.extensions, str):
        extensions = tuple([f'.{self.extensions}'])

	# ディレクトリを走査し，それぞれのディレクトリにある画像らをPDFにする
    for current_dir, dirs, files in os.walk(self.input_dir):
        print(f'[INFO] Watching {current_dir}.')

        # 対象の画像があるパスを格納するリスト
        images = []

        # filesはcurrent_dirにあるファイルのリスト
        # ソートは桁数が異なるものだと順番がおかしくなる(https://github.com/ikota3/image_utilities#note)
        # そのため，期待通りのものにするために関数を用意した(後述)
        for filename in sorted(files, key=natural_keys):
            if filename.endswith(extensions):
                path = os.path.join(current_dir, filename)
                images.append(path)

		# 走査した結果，画像がなかったとき
        if not images:
            print(
                f'[INFO] There are no {", ".join(self.extensions).upper()} files at {current_dir}.'
            )
            continue

        pdf_filename = os.path.join(
            self.output_dir, f'{os.path.basename(current_dir)}.pdf'
        )

		# -f パラメータがある時は，ファイルがあっても強制上書きを行う
        if self.force_write:
            with open(pdf_filename, 'wb') as f:
                f.write(img2pdf.convert(images))
            print(f'[INFO] Created {pdf_filename}!')
        else:
            if os.path.exists(pdf_filename):
                print(f'[ERROR] {pdf_filename} already exist!')
                continue

            with open(pdf_filename, 'wb') as f:
                f.write(img2pdf.convert(images))
            print(f'[INFO] Created {pdf_filename}!')

ディレクトリにある画像を集めるとき，ソートがおかしくなり，順番が期待しているものではなかった時があったため，関数を用意しました．
また，以下のリンクを参考に作っています．
https://stackoverflow.com/questions/5967500/how-to-correctly-sort-a-string-with-a-number-inside

sort_key.py

import re
from typing import Union, List


def atoi(text: str) -> Union[int, str]:
    """Convert ascii to integer.

    Args:
        text (str): string.

    Returns:
        Union[int, str]: integer if number, string otherwise.
    """
    return int(text) if text.isdigit() else text


def natural_keys(text: str) -> Union[List[int], List[str]]:
    """Key for natural sorting

    Args:
        text (str): string

    Returns:
        Union[List[int], List[str]]: A list of mixed integer and strings.
    """
    return [atoi(c) for c in re.split(r'(\d+)', text)]

終わり

以前，CLIツールをライブラリを使わずに作ったことがありましたが，fireと比べてみると簡単に実装することができました!

皆さんもCLIツール，作ってみませんか?

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up