Help us understand the problem. What is going on with this article?

Pythonのリストの全要素に任意の関数をapplyする最速の方法

More than 1 year has passed since last update.

謝辞

@termoshtt さんのご指摘により、numpy.frompyfuncのパターンを追加しました。

概要

Pythonの

  • list
  • numpy.ndarray
  • pandas.Series(中身はnumpy.ndarrayだが)

の3種類のリスト構造に対し、

を使って、リスト内の全要素に任意の関数をapplyする最速の方法を探す。

コード

要素数は1000万、3乗する関数を使用した。

import time

import numpy as np
import pandas as pd

N = 10000000

np_l = np.random.rand(N)
s = pd.Series(np_l)
l = list(np_l)

f = lambda x: x ** 3

def timer(func):
    def wrapper(*args, **kargs):
        time_list = []
        for i in range(10):
            start = time.time()
            func(*args, **kargs)
            time_list.append(time.time() - start)
        if func.__name__ != 'pandas_series_map':
            if isinstance(args[0], list):
                list_type = 'list'
            elif isinstance(args[0], np.ndarray):
                list_type = 'np.ndarray'
            elif isinstance(args[0], pd.Series):
                list_type = 'pd.Series'
            print('{} over {}, avg: {:.3f} sec, median: {:.3f} sec'.format(
                func.__name__, list_type, np.mean(time_list), np.median(time_list)))
        else:
            print('{}, avg: {:.3f} sec, median: {:.3f} sec'.format(
                func.__name__, np.mean(time_list), np.median(time_list)))
    return wrapper

@timer
def map_function(l, f):
    list(map(f, l))
@timer
def list_comprehension(l, f):
    [f(x) for x in l]
@timer
def numpy_vectorize(l, f):
    np.vectorize(f)(l)
@timer
def numpy_fromiter(l, f):
    np.fromiter((f(x) for x in l), np.float32, count=len(l))
@timer
def numpy_frompyfunc(l, f):
    np.frompyfunc(f, 1, 1)(l)
@timer
def pandas_series_map(s, f):
    s.map(f)

map_function(l, f)
map_function(np_l, f)
map_function(s, f)
list_comprehension(l, f)
list_comprehension(np_l, f)
list_comprehension(s, f)
numpy_vectorize(l, f)
numpy_vectorize(np_l, f)
numpy_vectorize(s, f)
numpy_fromiter(l, f)
numpy_fromiter(np_l, f)
numpy_fromiter(s, f)
numpy_frompyfunc(l, f)
numpy_frompyfunc(np_l, f)
numpy_frompyfunc(s, f)
pandas_series_map(s, f)

結果

見やすいように手で整えてます。

map_function       over list,       avg: 4.621 sec, median: 4.521 sec
map_function       over np.ndarray, avg: 5.245 sec, median: 5.244 sec
map_function       over pd.Series,  avg: 5.299 sec, median: 5.301 sec
list_comprehension over list,       avg: 4.810 sec, median: 4.795 sec
list_comprehension over np.ndarray, avg: 5.453 sec, median: 5.327 sec
list_comprehension over pd.Series,  avg: 5.430 sec, median: 5.397 sec
numpy_vectorize    over list,       avg: 7.127 sec, median: 7.074 sec
numpy_vectorize    over np.ndarray, avg: 4.001 sec, median: 3.976 sec
numpy_vectorize    over pd.Series,  avg: 3.897 sec, median: 3.882 sec
numpy_fromiter     over list,       avg: 5.738 sec, median: 5.733 sec
numpy_fromiter     over np.ndarray, avg: 6.578 sec, median: 6.541 sec
numpy_fromiter     over pd.Series,  avg: 6.485 sec, median: 6.491 sec
numpy_frompyfunc   over list,       avg: 4.153 sec, median: 4.143 sec
numpy_frompyfunc   over np.ndarray, avg: 3.186 sec, median: 3.144 sec
numpy_frompyfunc   over pd.Series,  avg: 3.109 sec, median: 3.104 sec
pandas_series_map,                  avg: 4.704 sec, median: 4.650 sec

numpy.ndarrayあるいはpandas.Seriesに対してnumpy.frompyfuncでufunc化した関数を適用するのが最速のようだ。
次点でnumpy.vectorize

map関数やリスト内包表記はnumpy.ndarrayよりもlistに対してのほうがやや速い傾向にある?
逆に、numpy.vectorizenumpy.frompyfuncを使用した場合はlistよりもnumpy.ndarrayに対してのほうが速い傾向にある?

ysk24ok
今後Qiitaに記事を更新することはありません。
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Comments
No comments
Sign up for free and join this conversation.
If you already have a Qiita account
Why do not you register as a user and use Qiita more conveniently?
You need to log in to use this function. Qiita can be used more conveniently after logging in.
You seem to be reading articles frequently this month. Qiita can be used more conveniently after logging in.
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
ユーザーは見つかりませんでした