More than 3 years have passed since last update.

javascript encodeURIComponentのハイフンと波ダッシュ文字化け問題の回避策

JavaScript

Last updated at 2021-03-10Posted at 2018-01-15

事件は現場(ブラウザ側)でおきる

Android版Chromeでbase64変換を行ってCGIにデータ送信を行っていたところ、登録されている文字列に「?(3F)」が混じっていた。
PHP側の問題かはたまたMariaDB側の問題かと調査していたところ、Javascriptの問題だと判明したのだった。

ハイフンと波ダッシュ問題とは

shift_jisとUnicodeの変換テーブルに一部文字が定義されていないため不可逆な変換を起こす厄介な問題。
Unicodeコンソーシアム-Shift-JIS to Unicode
ftp://ftp.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT

文字	UTF-16	UTF-8	依存OS	SJIS (unicode.org)
−	2212	E28892	BSD/Linux	817C
－	FF0D	EFBC8D	Windows	未定義
〜	301C	E3809C	BSD/Linux	8160
～	FF5E	EFBD9E	Windows	未定義
「ハイフンと波ダッシュ問題」は専らSJISベースのPHPやPerlなどCGI側の問題だと思っていたけど、UTF-16なJavascriptでも発生するもようだ。

今回の件はAndroidでのみ発生？

文字	UTF-16	UTF-8	encodeURIComponent()
−	2212	E28892	%3F
－	FF0D	EFBC8D	%EF%BC%8D
〜	301C	E3809C	%3F
～	FF5E	EFBD9E	%EF%BD%9E

犯人と解決策

Android上の日本語入力は「記号」に表示されている「～」や「－」が問題の文字コードのものを使っている場合がある。(IMEによって異なるかもしれない)

MDN-Base64 encoding and decoding
こちらのbase64エンコーダを使うと、**encodeURIComponent()**がそれらの文字を真っ先に「?(3F)」に変換してしまい、化けた状態でbase64化してしまうので、予め駄目な文字を変換してあげると幸せになる。

修正前

function b64EncodeUnicode(str) {
    // first we use encodeURIComponent to get percent-encoded UTF-8,
    // then we convert the percent encodings into raw bytes which
    // can be fed into btoa.
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        function toSolidBytes(match, p1) {
            return String.fromCharCode('0x' + p1);
    }));
}

修正後

function b64EncodeUnicode(str) {
    // first we use encodeURIComponent to get percent-encoded UTF-8,
    // then we convert the percent encodings into raw bytes which
    // can be fed into btoa.
    str = str.replace(/\u301c/g, '\uff5e')
            .replace(/\u2212/g, '\uff0d');
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        function toSolidBytes(match, p1) {
            return String.fromCharCode('0x' + p1);
    }));
}

その他参考にさせていただいたサイト

波ダッシュ、全角チルダ問題まとめ
 ASH-Unicode対応文字コード表
 fudist-Shift_JISのダメ文字
 UNIRITA-Oracleデータベースで、波ダッシュの文字化けはなぜ起きるのか？

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up