1

More than 3 years have passed since last update.

@munaita_(鈴木省吾)

Pythonの文字コードの文字列のリンク集とメモ

Python

Last updated at 2020-07-07Posted at 2020-07-07

文字コードについて

新人さんに知ってほしい「文字コードのお話」

符号化文字集合と文字符号化方式の2種類ある

符号化文字集合

文字とコードポイントのマッピング

例: unicode, ascii
python3ではデフォルトunicode


>>> hex(ord("あ"))
'0x3042' # unicodeの"あ"のコードポイント

文字符号化方式

文字の運用方式、実装方式

例: utf-8, shift-jis, euc-jp
"あ" は utf-8 と utf-16 で符号化すると異なるバイト列になる
- utf-8: 0xE3 0x81 0x82
- utf-16: 0x30, 0x42

Python3での文字の扱い

Python3で文字列を処理する際の心掛け

'あああ' のリテラルはunicodeの文字列
bytesに変換するときは 'あああ'.encode('utf-8')
bytesをunicodeに戻すときは bytes_moji.decode('utf-8')
unicodeでファイルに書き込むときと、byteで書き込ときと読むときは関数が異なる
Pythonでの文字コードの取り扱い

一応python2での扱いも

Python2で文字列を処理する際の心掛け

python2ではbyte化はデフォルトasciiで行われる
- 日本語文字列をファイルに変換しようとすると、デフォルトasciiでbyteにencodeしようとしてエラーになる
- utf-8 でencodeするように指定する必要がある

1

Register as a new user and use Qiita more conveniently

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

1