LoginSignup
55
39

More than 5 years have passed since last update.

Pythonのリストの全要素に任意の関数をapplyする最速の方法

Last updated at Posted at 2017-11-07

謝辞

@termoshtt さんのご指摘により、numpy.frompyfuncのパターンを追加しました。

概要

Pythonの

  • list
  • numpy.ndarray
  • pandas.Series(中身はnumpy.ndarrayだが)

の3種類のリスト構造に対し、

を使って、リスト内の全要素に任意の関数をapplyする最速の方法を探す。

コード

要素数は1000万、3乗する関数を使用した。

import time

import numpy as np
import pandas as pd

N = 10000000

np_l = np.random.rand(N)
s = pd.Series(np_l)
l = list(np_l)

f = lambda x: x ** 3

def timer(func):
    def wrapper(*args, **kargs):
        time_list = []
        for i in range(10):
            start = time.time()
            func(*args, **kargs)
            time_list.append(time.time() - start)
        if func.__name__ != 'pandas_series_map':
            if isinstance(args[0], list):
                list_type = 'list'
            elif isinstance(args[0], np.ndarray):
                list_type = 'np.ndarray'
            elif isinstance(args[0], pd.Series):
                list_type = 'pd.Series'
            print('{} over {}, avg: {:.3f} sec, median: {:.3f} sec'.format(
                func.__name__, list_type, np.mean(time_list), np.median(time_list)))
        else:
            print('{}, avg: {:.3f} sec, median: {:.3f} sec'.format(
                func.__name__, np.mean(time_list), np.median(time_list)))
    return wrapper

@timer
def map_function(l, f):
    list(map(f, l))
@timer
def list_comprehension(l, f):
    [f(x) for x in l]
@timer
def numpy_vectorize(l, f):
    np.vectorize(f)(l)
@timer
def numpy_fromiter(l, f):
    np.fromiter((f(x) for x in l), np.float32, count=len(l))
@timer
def numpy_frompyfunc(l, f):
    np.frompyfunc(f, 1, 1)(l)
@timer
def pandas_series_map(s, f):
    s.map(f)

map_function(l, f)
map_function(np_l, f)
map_function(s, f)
list_comprehension(l, f)
list_comprehension(np_l, f)
list_comprehension(s, f)
numpy_vectorize(l, f)
numpy_vectorize(np_l, f)
numpy_vectorize(s, f)
numpy_fromiter(l, f)
numpy_fromiter(np_l, f)
numpy_fromiter(s, f)
numpy_frompyfunc(l, f)
numpy_frompyfunc(np_l, f)
numpy_frompyfunc(s, f)
pandas_series_map(s, f)

結果

見やすいように手で整えてます。

map_function       over list,       avg: 4.621 sec, median: 4.521 sec
map_function       over np.ndarray, avg: 5.245 sec, median: 5.244 sec
map_function       over pd.Series,  avg: 5.299 sec, median: 5.301 sec
list_comprehension over list,       avg: 4.810 sec, median: 4.795 sec
list_comprehension over np.ndarray, avg: 5.453 sec, median: 5.327 sec
list_comprehension over pd.Series,  avg: 5.430 sec, median: 5.397 sec
numpy_vectorize    over list,       avg: 7.127 sec, median: 7.074 sec
numpy_vectorize    over np.ndarray, avg: 4.001 sec, median: 3.976 sec
numpy_vectorize    over pd.Series,  avg: 3.897 sec, median: 3.882 sec
numpy_fromiter     over list,       avg: 5.738 sec, median: 5.733 sec
numpy_fromiter     over np.ndarray, avg: 6.578 sec, median: 6.541 sec
numpy_fromiter     over pd.Series,  avg: 6.485 sec, median: 6.491 sec
numpy_frompyfunc   over list,       avg: 4.153 sec, median: 4.143 sec
numpy_frompyfunc   over np.ndarray, avg: 3.186 sec, median: 3.144 sec
numpy_frompyfunc   over pd.Series,  avg: 3.109 sec, median: 3.104 sec
pandas_series_map,                  avg: 4.704 sec, median: 4.650 sec

numpy.ndarrayあるいはpandas.Seriesに対してnumpy.frompyfuncでufunc化した関数を適用するのが最速のようだ。
次点でnumpy.vectorize

map関数やリスト内包表記はnumpy.ndarrayよりもlistに対してのほうがやや速い傾向にある?
逆に、numpy.vectorizenumpy.frompyfuncを使用した場合はlistよりもnumpy.ndarrayに対してのほうが速い傾向にある?

55
39
2

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
55
39