4
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

次の年号をゴリ押しで予測する

4
Last updated at Posted at 2019-02-28

なにこれ

こんなのを見掛けたので、

やってみた。

常用漢字を

文化庁から取ってきて

wget http://www.bunka.go.jp/kokugo_nihongo/sisaku/joho/joho/kijun/naikaku/kanji/joyokanjisakuin/index.html

案の定 sjis なので utf-8 に変換

nkf -w index.html > index.utf8.html

抜き出す

そして安定のfontタグ。

xpathはこれ
//*[@id="urlist"]/tbody/tr/td[1]/font[1]
a.py(前半)
import lxml.html

contents = open("index.utf8.html").read()

xpath = '//*[@id="urlist"]/tbody/tr/td[1]/font[1]'
doc = lxml.html.fromstring(contents)

elems = doc.xpath(xpath)
kanji = [ x.text_content() for x in elems ]

print(len(kanji))
print(kanji[:10])
$ python a.py
2136
['亜', '哀', '挨', '愛', '曖', '悪', '握', '圧', '扱', '宛']

2文字の順列

a.py(後半)
import itertools

for a, b in itertools.permutations(kanji, 2):
    print(f"次の年号は\t{a}{b}\tかな?")

結果

$ python a.py > out

$ wc -l out
4560360 out

$ head out
次の年号は	亜哀	かな?
次の年号は	亜挨	かな?
次の年号は	亜愛	かな?
次の年号は	亜曖	かな?
次の年号は	亜悪	かな?
次の年号は	亜握	かな?
次の年号は	亜圧	かな?
次の年号は	亜扱	かな?
次の年号は	亜宛	かな?
次の年号は	亜嵐	かな?

おわり。

4
0
1

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
4
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?