RubyのSJISはShift_JISじゃない

  • 15
    いいね
  • 2
    コメント
この記事は最終更新日から1年以上が経過しています。

エンコード一覧を取得するソース

Encoding.list.each {|e| p e }

エンコードの一覧

#<Encoding:ASCII-8BIT>
#<Encoding:UTF-8>
#<Encoding:US-ASCII>
#<Encoding:UTF-16BE (autoload)>
#<Encoding:UTF-16LE (autoload)>
#<Encoding:UTF-32BE (autoload)>
#<Encoding:UTF-32LE (autoload)>
#<Encoding:UTF-16 (dummy) (autoload)>
#<Encoding:UTF-32 (dummy) (autoload)>
#<Encoding:UTF8-MAC>
#<Encoding:EUC-JP (autoload)>
#<Encoding:Windows-31J (autoload)>
#<Encoding:Big5 (autoload)>
#<Encoding:Big5-HKSCS (autoload)>
#<Encoding:Big5-UAO (autoload)>
#<Encoding:CP949 (autoload)>
#<Encoding:Emacs-Mule (autoload)>
#<Encoding:EUC-KR (autoload)>
#<Encoding:EUC-TW (autoload)>
#<Encoding:GB2312 (autoload)>
#<Encoding:GB18030 (autoload)>
#<Encoding:GBK (autoload)>
#<Encoding:ISO-8859-1 (autoload)>
#<Encoding:ISO-8859-2 (autoload)>
#<Encoding:ISO-8859-3 (autoload)>
#<Encoding:ISO-8859-4 (autoload)>
#<Encoding:ISO-8859-5 (autoload)>
#<Encoding:ISO-8859-6 (autoload)>
#<Encoding:ISO-8859-7 (autoload)>
#<Encoding:ISO-8859-8 (autoload)>
#<Encoding:ISO-8859-9 (autoload)>
#<Encoding:ISO-8859-10 (autoload)>
#<Encoding:ISO-8859-11 (autoload)>
#<Encoding:ISO-8859-13 (autoload)>
#<Encoding:ISO-8859-14 (autoload)>
#<Encoding:ISO-8859-15 (autoload)>
#<Encoding:ISO-8859-16 (autoload)>
#<Encoding:KOI8-R (autoload)>
#<Encoding:KOI8-U (autoload)>
#<Encoding:Shift_JIS (autoload)>
#<Encoding:Windows-1251 (autoload)>
#<Encoding:IBM437>
#<Encoding:IBM737>
#<Encoding:IBM775>
#<Encoding:CP850>
#<Encoding:IBM852>
#<Encoding:CP852>
#<Encoding:IBM855>
#<Encoding:CP855>
#<Encoding:IBM857>
#<Encoding:IBM860>
#<Encoding:IBM861>
#<Encoding:IBM862>
#<Encoding:IBM863>
#<Encoding:IBM864>
#<Encoding:IBM865>
#<Encoding:IBM866>
#<Encoding:IBM869>
#<Encoding:Windows-1258>
#<Encoding:GB1988>
#<Encoding:macCentEuro>
#<Encoding:macCroatian>
#<Encoding:macCyrillic>
#<Encoding:macGreek>
#<Encoding:macIceland>
#<Encoding:macRoman>
#<Encoding:macRomania>
#<Encoding:macThai>
#<Encoding:macTurkish>
#<Encoding:macUkraine>
#<Encoding:CP950 (autoload)>
#<Encoding:CP951 (autoload)>
#<Encoding:stateless-ISO-2022-JP (autoload)>
#<Encoding:eucJP-ms (autoload)>
#<Encoding:CP51932 (autoload)>
#<Encoding:EUC-JIS-2004 (autoload)>
#<Encoding:GB12345 (autoload)>
#<Encoding:ISO-2022-JP (dummy)>
#<Encoding:ISO-2022-JP-2 (dummy)>
#<Encoding:CP50220 (dummy)>
#<Encoding:CP50221 (dummy)>
#<Encoding:Windows-1252 (autoload)>
#<Encoding:Windows-1250 (autoload)>
#<Encoding:Windows-1256 (autoload)>
#<Encoding:Windows-1253 (autoload)>
#<Encoding:Windows-1255 (autoload)>
#<Encoding:Windows-1254 (autoload)>
#<Encoding:TIS-620 (autoload)>
#<Encoding:Windows-874 (autoload)>
#<Encoding:Windows-1257 (autoload)>
#<Encoding:MacJapanese (autoload)>
#<Encoding:UTF-7 (dummy)>
#<Encoding:UTF8-DoCoMo>
#<Encoding:SJIS-DoCoMo (autoload)>
#<Encoding:UTF8-KDDI>
#<Encoding:SJIS-KDDI (autoload)>
#<Encoding:ISO-2022-JP-KDDI (dummy)>
#<Encoding:stateless-ISO-2022-JP-KDDI (autoload)>
#<Encoding:UTF8-SoftBank>
#<Encoding:SJIS-SoftBank (autoload)>

エイリアス一覧を取得するソース

Encoding.aliases.each {|e|p e}

エイリアスの一覧

["BINARY", "ASCII-8BIT"]
["CP437", "IBM437"]
["CP737", "IBM737"]
["CP775", "IBM775"]
["IBM850", "CP850"]
["CP857", "IBM857"]
["CP860", "IBM860"]
["CP861", "IBM861"]
["CP862", "IBM862"]
["CP863", "IBM863"]
["CP864", "IBM864"]
["CP865", "IBM865"]
["CP866", "IBM866"]
["CP869", "IBM869"]
["CP1258", "Windows-1258"]
["Big5-HKSCS:2008", "Big5-HKSCS"]
["eucJP", "EUC-JP"]
["euc-jp-ms", "eucJP-ms"]
["EUC-JISX0213", "EUC-JIS-2004"]
["eucKR", "EUC-KR"]
["eucTW", "EUC-TW"]
["EUC-CN", "GB2312"]
["eucCN", "GB2312"]
["CP936", "GBK"]
["ISO2022-JP", "ISO-2022-JP"]
["ISO2022-JP2", "ISO-2022-JP-2"]
["ISO8859-1", "ISO-8859-1"]
["CP1252", "Windows-1252"]
["ISO8859-2", "ISO-8859-2"]
["CP1250", "Windows-1250"]
["ISO8859-3", "ISO-8859-3"]
["ISO8859-4", "ISO-8859-4"]
["ISO8859-5", "ISO-8859-5"]
["ISO8859-6", "ISO-8859-6"]
["CP1256", "Windows-1256"]
["ISO8859-7", "ISO-8859-7"]
["CP1253", "Windows-1253"]
["ISO8859-8", "ISO-8859-8"]
["CP1255", "Windows-1255"]
["ISO8859-9", "ISO-8859-9"]
["CP1254", "Windows-1254"]
["ISO8859-10", "ISO-8859-10"]
["ISO8859-11", "ISO-8859-11"]
["CP874", "Windows-874"]
["ISO8859-13", "ISO-8859-13"]
["CP1257", "Windows-1257"]
["ISO8859-14", "ISO-8859-14"]
["ISO8859-15", "ISO-8859-15"]
["ISO8859-16", "ISO-8859-16"]
["CP878", "KOI8-R"]
["MacJapan", "MacJapanese"]
["ASCII", "US-ASCII"]
["ANSI_X3.4-1968", "US-ASCII"]
["646", "US-ASCII"]
["CP65000", "UTF-7"]
["CP65001", "UTF-8"]
["UTF-8-MAC", "UTF8-MAC"]
["UTF-8-HFS", "UTF8-MAC"]
["UCS-2BE", "UTF-16BE"]
["UCS-4BE", "UTF-32BE"]
["UCS-4LE", "UTF-32LE"]
["CP932", "Windows-31J"]
["csWindows31J", "Windows-31J"]
["SJIS", "Windows-31J"]
["PCK", "Windows-31J"]
["CP1251", "Windows-1251"]
["locale", "UTF-8"]
["external", "UTF-8"]
["filesystem", "UTF-8"]

まとめ

  • CP932というエイリアスはWindows-31Jを指す
  • SJISというエイリアスはWindows-31Jを指す
  • Shift_JISはWindows-31Jとは別モノ
  • つまりSJISとShift_JISは別モノ!
  • どれくらい別物かというと、UNICODEの対応表が結構違う