More than 5 years have passed since last update.

次の年号をゴリ押しで予測する

Python

Last updated at 2019-02-28Posted at 2019-02-28

なにこれ

こんなのを見掛けたので、

常用漢字が2136字なので、2文字の組み合わせは約456万通り、1秒に2つずつ漏洩させていけば約27日、まだ元号の発表までに間に合う、今すぐやれ
— twinrail (@twinrail) 2019年2月26日

やってみた。

常用漢字を

文化庁から取ってきて

wget http://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kijun/naikaku/kanji/joyokanjisakuin/index.html

案の定 sjis なので utf-8 に変換

nkf -w index.html > index.utf8.html

抜き出す

そして安定のfontタグ。

xpathはこれ

//*[@id="urlist"]/tbody/tr/td[1]/font[1]

a.py（前半）

import lxml.html

contents = open("index.utf8.html").read()

xpath = '//*[@id="urlist"]/tbody/tr/td[1]/font[1]'
doc = lxml.html.fromstring(contents)

elems = doc.xpath(xpath)
kanji = [ x.text_content() for x in elems ]

print(len(kanji))
print(kanji[:10])

$ python a.py
2136
['亜', '哀', '挨', '愛', '曖', '悪', '握', '圧', '扱', '宛']

2文字の順列

a.py（後半）

import itertools

for a, b in itertools.permutations(kanji, 2):
    print(f"次の年号は\t{a}{b}\tかな？")

結果

$ python a.py > out

$ wc -l out
4560360 out

$ head out
次の年号は	亜哀	かな？
次の年号は	亜挨	かな？
次の年号は	亜愛	かな？
次の年号は	亜曖	かな？
次の年号は	亜悪	かな？
次の年号は	亜握	かな？
次の年号は	亜圧	かな？
次の年号は	亜扱	かな？
次の年号は	亜宛	かな？
次の年号は	亜嵐	かな？

おわり。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up