0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

ディレクトリ内のpdfをまとめてimagemagickで二値化する

Last updated at Posted at 2019-03-16

cygwin上で実施。要imagemagick, ghostscript
むちゃくちゃメモリを使う。32Gでギリギリだったが、
limit_memory_size=10280MBとできるようにした。

convertでpdfから画像を作るときに地獄のようにメモリを
使うようなので、pdftoppm にそこの部分を変更。
これにより、要 Poppler 。convertはいずれにせよ二値かなどで使うので、
binary_threshold, limit_memory_sizeを指定できるのはそのまま。
thresholdの値は、事前にpdftoppm で画像にしたものなどで調整するとよい。
(pre-testの部分)

text

# !/bin/bash

# convert -threshold 30000 seihi_even.pdf-001.png seihi_even_test_005_bin.png

binary_threshold=40000
limit_memory_size=5140MB

for ls_result in $(ls *.pdf) ; do

    processing_pdfname=$ls_result
    pagenumber=$(gs -q -dNODISPLAY -c "($processing_pdfname)  (r) file runpdfbegin pdfpagecount = quit")
    echo "processing " $processing_pdfname " ..."

    for ((i=1; i < $pagenumber; i++)); do
    
        #insert 0 or 00
        if [ $[$i] -le 9 -a $[$pagenumber] -ge 100 ];then
            ZERO=00
        elif [ $[$i] -le 9 -a $[$pagenumber] -lt 100 ];then
            ZERO=0
        elif [ $[$i] -le 99 -a $[$pagenumber] -ge 100 ];then
            ZERO=0
        elif [ $[$i] -le 99 -a $[$pagenumber] -lt 100 ];then
            ZERO=''
        else
            ZERO=''
        fi
    
        pdftoppm -png -f $i -l $i $processing_pdfname $processing_pdfname

        #echo "making binary " ${processing_pdfname}-${ZERO}${i}_bin.png " ..."
        convert -limit memory $limit_memory_size -threshold $binary_threshold ${processing_pdfname}-${ZERO}${i}.png ${processing_pdfname}-${ZERO}${i}_bin.png 
        rm ${processing_pdfname}-${ZERO}${i}.png

    done

    #echo "making pdf " ${processing_pdfname}_bin.pdf " ..."
    convert -limit memory $limit_memory_size ${processing_pdfname}-*_bin.png ${processing_pdfname}_bin.pdf
    rm ${processing_pdfname}-*_bin.png

done


関連技術は以下参照

https://qiita.com/NickTominaga/items/1040b5a15074ace9fcc5
https://qiita.com/NickTominaga/items/25dad0749d3a4be59d8e
https://qiita.com/NickTominaga/items/23a054e0b42735a6d99c

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?