2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

UnicodeDecodeError: 'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range(128)

Last updated at Posted at 2013-12-23

結論

文字コード 「あ」の文字コード len('あ')
unicode \u3042 1
euc-jp \xa4\xa2 2
utf-8 \xe3\x81\x82 3
変換 コード 戻り値
euc-jp → unicode unicode('\xa4\xa2','euc-jp') u'\u3042'
euc-jp → unicode '\xa4\xa2'.decode('euc-jp') u'\u3042'
utf-8 → unicode unicode('\xe3\x81\x82','utf-8') u'\u3042'
utf-8 → unicode '\xe3\x81\x82'.decode('utf-8') u'\u3042'
unicode → euc-jp u'\u3042'.encode('euc-jp') '\xa4\xa2'
unicode → utf-8 u'\u3042'.encode('utf-8') '\xe3\x81\x82'
utf-8 → unicode → euc-jp unicode('\xe3\x81\x82','utf-8').encode('euc-jp') '\xa4\xa2'
utf-8 → unicode → euc-jp '\xe3\x81\x82'.decode('utf-8').encode('euc-jp') '\xa4\xa2'
euc-jp → unicode → utf-8 unicode('\xa4\xa2','euc-jp').encode('utf-8') '\xe3\x81\x82'
euc-jp → unicode → utf-8 '\xa4\xa2'.decode('euc-jp').encode('utf-8') '\xe3\x81\x82'

Unicode

Unicodeの時
>>> string=u'あ'
>>> string
u'\u3042'

EUC-JP -> Unicode

EUC-JPの時
>>> string='あ'
>>> string
'\xa4\xa2'
>>> len(string)
2
>>> unicode(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa4 in position 0: ordinal not in range(128)
>>> unicode(string,'euc-jp')
u'\u3042'

UTF -> Unicode

UTF-8の時
>>> string='あ'
>>> string
'\xe3\x81\x82'
>>> len(string)
3
>>> unicode(string)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe3 in position 0: ordinal not in range(128)
>>> unicode(string,'utf-8')
u'\u3042'

文字コード変換関数&メソッド

変換種別 関数 or メソッド
非unicode文字列 → unicode文字列 unicode([非unicode文字列], [文字コード], [errors='strict'])
非unicode文字列 → unicode文字列 非unicode文字列.decode([文字コード], [errors='strict'])
unicode文字列 → 非unicode文字列 unicode文字列.encode([文字コード], [errors='strict'])

errors

unicode encode decode errors 内容
strict UnicodeDecodeError を送出する
replace U+FFFD, ‘REPLACEMENT CHARACTER’ を追加
ignore 結果の Unicode 文字列から文字を除く
× × xmlcharrefreplace XML 文字参照を使う
2
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?