超簡易な欧文整形フィルタ
欧文プレーンテキストを弄っていると、こういうのが便利な時もあるので、備忘録として。
標準入力から受け取って整形された欧文テキストを返す(一行辺りの字数はコマンドライン引数で指定する)。日本語には対応していないが、分かち書きしない和文にはそもそも不要。
$ cat my.txt | format 40
みたいに使う。Textパッケージも使っていないやる気のないコードだが、そもそもそんな大量のテキストを処理することを想定していないのでこのまま。
Format.hs
import Control.Monad
import System.Environment (getArgs)
import System.Exit (exitSuccess)
-- 各行の右端はとりあえず指定字数まで空白で埋める
pretty :: Int -> [String] -> [String] -> Int -> IO ()
pretty count buffer strs n
| null strs && null buffer = return ()
| null strs = flushWithWS 1 >> putStrLn (ws (n-count-(wc-1)*1)) -- 最終行は単純に左揃え
| expect > n = flushWithWS p >> putStrLn (ws (n-count-(wc-1)*p)) >> pretty 0 [] strs n
| expect == n = flushWithWS 1 >> putStrLn (ws 1 ++ s) >> pretty 0 [] ss n
| expect < n = pretty (count + length s) (s:buffer) ss n
where
(s:ss) = strs -- 目下読んでいる単語と残りの単語リスト
wc = length buffer -- バッファの単語数
expect = count + wc + length s -- 現在行に次の単語を入れると必要になる最低字数
ws n = take n $ repeat ' ' -- n文字の空白
p = div (n - count) wc -- 空白を均等割にするときの空白幅
flushWithWS n = putStr $ drop n $ concatMap (ws n ++) $ reverse $ buffer -- n個の空白を挟みつつバッファを出力
main = do
args <- map read <$> getArgs
when (null args) $ putStrLn "Usage: cat file | format (number of letters per line)" >> exitSuccess
dat <- words <$> getContents
pretty 0 [] dat (args!!0)
試しにゲティスバーグ演説を66字幅で整形してみた。
Four score and seven years ago our fathers brought forth on this
continent, a new nation, conceived in Liberty, and dedicated to
the proposition that all men are created equal. Now we are engaged
in a great civil war, testing whether that nation, or any nation
so conceived and so dedicated, can long endure. We are met on a
great battle-field of that war. We have come to dedicate a portion
of that field, as a final resting place for those who here gave
their lives that that nation might live. It is altogether fitting
and proper that we should do this. But, in a larger sense, we can
not dedicate - we can not consecrate - we can not hallow - this
ground. The brave men, living and dead, who struggled here, have
consecrated it, far above our poor power to add or detract. The
world will little note, nor long remember what we say here, but it
can never forget what they did here. It is for us the living,
rather, to be dedicated here to the unfinished work which they who
fought here have thus far so nobly advanced. It is rather for us
to be here dedicated to the great task remaining before us - that
from these honored dead we take increased devotion to that cause
for which they gave the last full measure of devotion - that we
here highly resolve that these dead shall not have died in vain -
that this nation, under God, shall have a new birth of freedom -
and that government of the people, by the people, for the people,
shall not perish from the earth.
まあ、それなり。