Help us understand the problem. What is going on with this article?

Pythonで "文字列".split() した結果を一括変換したい

More than 3 years have passed since last update.

こういうの

a, b, c = "hello 3.14 0".split()
b = float(b)
c = int(c)

面倒くさいから、一発で出来る方法が欲しいんだけど…
多分みんな再発明してるよね?

twitterで@odashi_tさんが

Pythonで行を読み込んでスペース区切り、各変数に値を読み込むまでの処理、いつも
a,b,c,d,e,... = line.split()
してからそれぞれ変換してるんだけど、なんか簡単に書く方法ないか

別に正規表現ほど高機能なものはいらないので、
a,b,c,d,e = split_and_convert(line, (int, float, str, str, int))
みたいな感じで書けると嬉しい。

ってツイートしてるのを見て、僕も皆さんの賢い方法が知りたくなったので、自分が使ってるやつを貼ってみることにします

def map_apply(proc, args):
    # return [f(x) for f, x in zip(proc, args)]
    return map(lambda f,x:f(x), proc, args)  # map(apply, proc, args) doesn't work like this

class FieldConverter:
    def __init__(self, *args):
        self._converters = args
        def conv_proc(f):
            def _wrap(conv_proc_to_num):
                def proc(x):
                    try:
                        return conv_proc_to_num(x)
                    except ValueError:
                        return 0
                return proc
            if f is None:
                return lambda s:s  # identity
            elif isinstance(f, str):
                return lambda s:s.decode(f)  # -> unicode
            elif f in (int, float):
                return _wrap(f)
            else:
                return f
        if len(args) > 0:
            self.converters = [conv_proc(f) for f in self._converters]
        else:
            self.converters = None

    def convert(self, fields):
        if self.converters:
            return map_apply(self.converters, fields)
        else:
            return fields


def test_map_apply():
    assert [0, -1, 3.14, 5.0] == map_apply([int,int,float,float], ['0','-1','3.14','5'])
    assert [0,u'日本語','日本語'] == map_apply([int,lambda s:s.decode('utf-8'),lambda s:s], ['0','日本語','日本語'])

def test_field_converter():
    assert [0, -1, 3.14, 5.0] == FieldConverter(int,int,float,float).convert(['0','-1','3.14','5'])
    assert [0,u'日本語','日本語'] == FieldConverter(int,'utf-8',None).convert(['0','日本語','日本語'])

int, float は説明要らないかな。
'utf-8'とかやるとUTF-8でデコードしてunicodeに。
Noneは無変換で
1変数関数なら何でも(lambdaでも)渡せます。

@odashi_tさんの split_and_convert() は、

def split_and_convert(line, types, delim=' '):
    field_converter = FieldConverter(*types)
    return field_converter.convert(line.split(delim))

def test_split_and_convert():
    assert [0, -1, 3.14, 5.0] == split_and_convert('0\t-1\t3.14\t5', (int,int,float,float), '\t')
    assert [0, u'日本語', '日本語'] == split_and_convert('0,日本語,日本語', (int,'utf-8',None), ',')

こんな感じで実装できるかな。field_converterはキャッシュしておきたいけど。

変換がintとかfloatとかだけなら単に

def split_and_convert(line, types, delim=' '):
    return [f(x) for f,x in zip(types, line.split(delim))]

で済むけど。

これなら

a, b, c, d = split_and_convert('0\t-1\t3.14\t5', (int,int,float,float), '\t')

みたいに使える。(a=0, b=-1, t=3.14, d=5.0)

とりあえずコードはgistに貼っておきます
https://gist.github.com/naoyat/3db8cd96c8dcecb5caea

naoya_t
自然言語処理とか機械学習とか競技プログラミングとか
https://naoyat.hatenablog.jp/
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした