More than 3 years have passed since last update.

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 124: invalid continuation byte が出たときのその場しのぎ対処

Last updated at 2021-10-12Posted at 2021-10-07

Python力が低いのでメモ。

      process = subprocess.Popen(
          cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
      with utils.timing(
          f'Jackhmmer ({os.path.basename(database_path)}) query'):
        _, stderr = process.communicate()
        retcode = process.wait()

      if retcode:
        raise RuntimeError(
            'Jackhmmer failed\nstderr:\n%s\n' % stderr.decode('utf-8'))

のなかで、エラーが発生するのだが、

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 124: invalid continuation byte

となってしまい、エラー処理中にさらにエラーが発生して、肝心のエラーが見られないという状況になってしまった。Python力が低いのでぐぐってみたが、open時に適切なエンコーディングを指定する話が上位にヒットしてあまり役に立たなかったのでメモを残す。

公式リファレンスをみると、decode は第二引数を取って、失敗したときにどうするか決められる。ここで ignore を指定して

decode('utf-8', errors='ignore')

などとすれば、とりあえず回避できる。ちなみにエラーの内容は

Error: Parse failed (sequence file /mnt/mgnify_database_path/mgy_clusters_2018_12.fa):
Line 29325242: non-ASCII character  in sequence

であった。mgnify database にどこかおかしいところがあるようだ。ダウンロードに失敗したのだろうか？

追記：メモリーのトラブルにより、ダウンロードしたファイルに損傷が生じていたようです。

この記事は以上です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up