More than 1 year has passed since last update.

UnicodeDecodeErrorを回避する方法

Posted at 2022-08-15

【muiの記録】
現在、Djangoを使用してWebアプリケーションを作成しております。
データの処理でUnicodeDecodeErrorが出た際、回避方法を考えましたので共有します。

1、デコード時のユニコード判定してエラーを回避。
2、エンコード時に元のユニコードを指定。

###forms.py


from django import forms

class UploadForm(forms.Form):
    #フォームから選択されたファイルを格納
    testfile = forms.FileField()

###views.py


from django.shortcuts import render
from application.forms import UploadForm
import pandas as pd
import chardet

def url(reuest):
    if request.method == 'POST':
        upload = UploadForm(request.POST, request.FILES)
        if upload.is_valid():
            form_data = upload.clearned_data
            file = form_data['testfile']
            
            read_file = file.read()#ファイルの読み込み
            result = chardet.detect(read_file)#ユニコードの判別
            enc = result['encoding']#resultは辞書型なのでそこからencodingを取り出す。

            file_data = pd.read_csv(io.StringIO(read_file.decode(enc)),delimiter=',')#decode

            #処理

            after_file = #処理後のファイル
            type_data = 'text/csv; charset=' + enc
            response = HttpResponse(content_type=type_data)
            response['Content-Disposition'] = 'attachment; filename="result.csv"'

            after_file.to_csv(path_or_buf = response, encoding = enc, index=False)#encode

            return response

データが短いとchardetの精密性が損なわれてencにnoneが返されるのでそちらの解決が必要になります。
shift-jisとcp932のユニコードを使用してテストしてみるのもありです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up