PythonでUnicodeDecodeErrorが発生した時の対応

Posted at 2021-12-23

はじめに

対応方法を検索した時に、それぞれについてはあったものの
一緒に行うものがなかったので備忘録としてのメモです。

問題

Windows環境で作成したzipファイルを読み込んだ時に、 UnicodeDecodeError が発生することがある。
原因は、文字コードがShift JISだけど、PythonはUTF-8で扱うので、その違いによるもの。

対応方法

その1

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8e

が発生した時、Python 3 で日本語ファイル名が入った zip ファイルを扱うに記載されている通り、

file_name.encode("cp437").decode("cp932")

としたら良い。

その2

「その1」の対応で万事OKかというと、実はそうでもなかったりする。
今度は、

UnicodeDecodeError: 'cp932' codec can't decode byte 0x83

というエラーが発生する。

この場合は、 Python】【Django】「UnicodeDecodeError: ‘cp932’ codec can’t decode byte 0x83 in position」と表示される場合の対処方法_100162 にある通り、

file_name.encode("cp437").decode("utf-8")

としたら良い。

その3

「その1」及び「その2」だと対応できるのがそれぞれのパターンしかできない。
そこで、この二つを合わせると良い。

# infoはZipInfo型の変数
if not (info.flag_bits & 0x800):
  if info.flag_bits & 0x008:
    print(info.file_name.encode("cp437").decode("utf-8"))
  else:
    print(info.file_name.encode("cp437").decode("cp932"))
else:
  print(info.file_name)

結論

その3の方法を取るとベスト

（ただ、ちょっと無理矢理な感じがするので、もう少し調査＆まとめが必要そう）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up