pythonで文字列の見た目幅の長さを調べる #Python

等幅フォントでの縦揃えがしたくて書いてみた

文字コードで全角判定をしてみる

# -*- coding: utf-8 -*-

def width(s):
    return sum([_width(c) for c in s.decode('utf-8')])

def _width(c):
    return 2 if _isMultiByte(c) else 1

def _isMultiByte(c):
    return ord(c) > 255

動作確認

assert 0 == width('')

assert 1 == width('a')
assert 1 == width('A')
assert 1 == width('ｱ')
assert 1 == width('-')
assert 1 == width('+')
assert 1 == width(' ')
assert 1 == width('\\')
assert 1 == width('\n')
assert 1 == width('\t')

assert 2 == width('aa')
assert 2 == width('AA')
assert 2 == width('が')
assert 2 == width('っ')
assert 2 == width('ぽ')
assert 2 == width('い')
assert 2 == width('！')
assert 2 == width('Ｇ')
assert 2 == width('ア')
assert 2 == width('○')
assert 2 == width('■')
assert 2 == width('、')
assert 2 == width('。')
assert 2 == width('‐')
assert 2 == width('ー')
assert 2 == width('　')
assert 2 == width('\\\\')
assert 2 == width('\r\n')

assert 4 == width('ＹＹ')
assert 4 == width('0ｗ0')
assert 4 == width('OK！')

ｱだけおかしい...全角だと判定されている...

East Asian Widthという規定があるとのこと

_isMultiByteだけ修正

# -*- coding: utf-8 -*-

def width(s):
    return sum([_width(c) for c in s.decode('utf-8')])

def _width(c):
    return 2 if _isMultiByte(c) else 1

def _isMultiByte(c):
    import unicodedata
    return unicodedata.east_asian_width(c) in ['F', 'W', 'A']

今度は全部問題なし！
これで良さそうかな？

補足

想定しているのは標準入力とか

$ echo ほわ〜ぁ | python width.py
8

width.py

# -*- coding: utf-8 -*-

import sys
s = sys.stdin.readlines()[0].strip()

print width(s)

コマンドライン引数とかです

$ python width.py ほわ〜ぁ
8

width.py

# -*- coding: utf-8 -*-

import sys
s = sys.argv[1]

print width(s)

あと、ギリシャ文字とかアラビア文字とかは動作保証対象外です