0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

広告フィード出力時にInput is not proper UTF-8

Posted at

現象

広告フィードファイルを出力したら、chromeとfirefoxなどで読み込みすると、エラーになります。

This page contains the following errors:
error on line 752 at column 910:Input is not proper UTF-8,indicate encoding!
Bytes: 0xE7 0x9C 0x8B 0xE8

原因

書いてる通り、UTF-8として不正な(変化できない)バイトが存在してるため

解決

不正なバイトを無くせば良いので、こちらを参考にしました。
http://stackoverflow.com/questions/8635578/how-to-check-whether-the-character-is-utf-8

scrubメソッドで一発でできる
以下はファイルパスを渡して、不正なバイトをなくして、ファイルへ書き直すこと

def delete_invalid_char(file_path)
  valid_string = File.read(file_path, :encoding => Encoding::UTF_8).scrub('')
  File.write(file_path, valid_string)
end

検証

  • 不正なエンコードされていないか

    File.read('feed.xml').valid_encoding?
    
  • フィード形式などが正しいかどうかを検証
    https://github.com/alexdunae/w3c_validators

    require 'w3c_validators'
    
    include W3CValidators
    
    @validator = FeedValidator.new
    
    results = @validator.validate_file('feed.xml')
    
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?