More than 3 years have passed since last update.

Pythonで文字列からUnicode変換「u''」と「.decode('utf-8')」

Last updated at 2020-11-25Posted at 2020-11-25

[\u30d7\u30ed\u30c0\u30af\u30c8\u30c7\u30b6\u30a4\u30f3]

このコードは、Unicode文字のエスケープシーケンスというらしいです。
（情報源：https://blog.xin9le.net/entry/2015/02/08/212947）

python 2.x（3の前のいろいろ）バージョンでは、日本語をこの状態に変換しないといけません。
そのために、文字の前に「u」をつけてください。するとunicode変換されます。

u'プロダクトデザイン'

ですが、もしここに変数を入れようと思った場合、「u」をつけても、新しい変数の名前になってしまいます。
これでは何もできません。

test.py

test = 'プロダクトデザイン'

nanika(utest)
# nanikaは適当です。

なので、.decode('utf-8')を使います。以上！
（情報源：https://qiita.com/yubessy/items/9e13af05a295bbb59c25）

test.py

test = 'プロダクトデザイン'

nanika(test.decode('utf-8'))
# nanikaは適当です。

あと補足ですが、文字列とunicode文字エスケープシーケンスなどをwebサイト上で変換することができるサイトがあります。
こちら。どうぞ。
http://0xcc.net/jsescape/

あと、unicode文字エスケープシーケンスを普通の文字列にjavascriptで変換しやすいように関数を作った方がいます。
素晴らしすぎる..。感謝しかないです。
https://shanabrian.com/web/javascript/unicode-unescape.php

var unicodeUnescape = function(str) {
    var result = '', strs = str.match(/\\u.{4}/ig);
    if (!strs) return '';
    for (var i = 0, len = strs.length; i < len; i++) {
        result += String.fromCharCode(strs[i].replace('\\u', '0x'));
    }
    return result;
};

var result = unicodeUnescape('abc123\\u3042\\u3044\\u3046\\u3048\\u304a');
 
console.log(result);

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up