More than 1 year has passed since last update.

GFM(GitHub Flavored Markdown)をPDFとして出力する

Last updated at 2023-02-06Posted at 2023-02-05

はじめに

メモを書いたりするとき、素のMarkdownよりemojiのshortcodeなどがあってイケているGFMで書きたくなりますよね。
自分でプレビューする分にはこれのようなVSCodeの拡張機能で十分ですが、PDFに変換したいときにうまくできる方法が意外と見つからなかったので、PDFに変換するスクリプトを作りました。この記事ではそちらを紹介します

手順

準備

必要なパッケージをインストールします（下記はcondaの場合）

conda install -c conda-forge selenium webdriver-manager grip

実行

python gfm2pdf.py input.md
# -> input.mdと同じフォルダにPDFファイルが出力されます（中間生成物のHTMLファイルも出力されます）

gfm2pdf.py

import argparse
import base64
import os
from pathlib import Path

import grip
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from webdriver_manager.chrome import ChromeDriverManager


def to_html(input_path: str) -> str:
    basename = os.path.splitext(input_path)[0]
    html = f"{basename}.html"
    grip.export(path=input_path, out_filename=html, title=basename)
    return html


def fix_style(html: str) -> str:
    with open(html, mode="r+", encoding="utf-8") as f:
        # 囲いを消して余白を調整
        soup = BeautifulSoup(f.read(), "html.parser")
        soup.find("div", {"class": "Box-header"}).extract()
        soup.find("div", {"class": "Box-body"}).unwrap()
        soup.find("div", {"id": "readme"}).unwrap()
        soup.find("div", {"class": "Layout-main"}).unwrap()
        soup.find("div", {"class": "Layout"}).unwrap()
        soup.find("div", {"class": "clearfix"}).unwrap()
        soup.find("div", {"class": "repository-content"}).unwrap()
        soup.find("div", {"class": "clearfix"}).unwrap()
        soup.find("main").unwrap()
        soup.find("div", {"id": "preview-page"}).unwrap()
        soup.find("div", {"class": "page"}).wrap(
            soup.new_tag("div", attrs={"id": "preview-page"})
        )
        soup.find("div", {"class": "page"}).unwrap()
        soup.find("article").wrap(soup.new_tag("div", attrs={"id": "tmp"}))
        soup.find("article").unwrap()
        soup.find("div", {"id": "tmp"}).wrap(
            soup.new_tag(
                "article",
                attrs={
                    "class": "markdown-body entry-content p-4",
                    "id": "grip-content",
                },
            )
        )
        soup.find("div", {"id": "tmp"}).unwrap()
        # 上書き保存
        f.truncate(0)
        f.seek(0)
        f.write(str(soup))
    return html


def to_pdf(html: str) -> str:
    options = webdriver.ChromeOptions()
    options.add_argument("--headless")
    with webdriver.Chrome(
        service=ChromeService(ChromeDriverManager().install()), options=options
    ) as driver:
        driver.get(f"file://{html}")
        parameters = {
            "printBackground": True,
            # A4
            "paperWidth": 8.26,
            "paperHeight": 11.68,
            "marginTop": 0,
            "marginBottom": 0,
            "marginLeft": 0,
            "marginRight": 0,
        }
        pdf = base64.b64decode(
            driver.execute_cdp_cmd("Page.printToPDF", parameters)["data"]
        )
        output_path = f"{os.path.splitext(html)[0]}.pdf"
        with open(output_path, "wb") as f:
            f.write(pdf)
    return output_path


def main():
    parser = argparse.ArgumentParser()
    parser.add_argument("input", type=Path, help="変換対象のmarkdownファイルのパス")
    args = parser.parse_known_args()[0]
    input_path = Path(os.path.expanduser(args.input)).resolve()
    # convert to html
    html = to_html(input_path)
    html = fix_style(html)
    # print to pdf using chrome
    to_pdf(html)


if __name__ == "__main__":
    main()

出力例

入力

sample.md

# ToDo リスト

1. :sushi:を食べ:beer:を飲む
1. 下記を実行する
   \`\`\`shell <!-- Qiita上で表示が壊れないようにバックスラッシュを付けています -->
   echo 'ていうかもう寝よう' > /dev/null # a line in #1
   \`\`\`
1. GitHub のロゴを眺める
   ![logo](https://github.githubassets.com/images/modules/logos_page/GitHub-Mark.png)

出力

解説

gripでGitHubのMarkdown APIを叩いてHTMLに変換
gripが付けてくれる枠が邪魔なのでBeautiful Soupで整形
seleniumを介してHTMLをchromeで開き、Chrome DevTools Protocolを利用してPDFとして保存する

という流れのスクリプトになっています

注意事項

この方法はGitHubのAPIを用いて変換するものです。機密情報などが含まれた文書には使わないでください。また、たくさんのファイルを変換したい場合はAPIのrate limitを拡張するために認証が必要になるかもしれません。詳しくはgripのドキュメントを参照してください

感想

こんなことせずにobsidian使えばいいだけという説はあります。
オフラインで同じことができたらよりよかったですが、調べた限りemojiとシンタックスハイライトを両立しているものがなかったので残念です。使用したライブラリのgripではオフラインレンダリング機能を開発していたようですが、開発が止まってしまっているようです

参考

SeleniumとHeadless ChromeでページをPDFに保存する

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up