Qiita Teams that are logged in
You are not logged in to any team

Community
Service
Qiita JobsQiita ZineQiita Blog
31
Help us understand the problem. What is going on with this article?
@ysk24ok

# Pythonのリストの全要素に任意の関数をapplyする最速の方法

More than 3 years have passed since last update.

# 謝辞

@termoshtt さんのご指摘により、numpy.frompyfuncのパターンを追加しました。

# 概要

Pythonの

• list
• numpy.ndarray
• pandas.Series（中身はnumpy.ndarrayだが）

の3種類のリスト構造に対し、

を使って、リスト内の全要素に任意の関数をapplyする最速の方法を探す。

# コード

import time

import numpy as np
import pandas as pd

N = 10000000

np_l = np.random.rand(N)
s = pd.Series(np_l)
l = list(np_l)

f = lambda x: x ** 3

def timer(func):
def wrapper(*args, **kargs):
time_list = []
for i in range(10):
start = time.time()
func(*args, **kargs)
time_list.append(time.time() - start)
if func.__name__ != 'pandas_series_map':
if isinstance(args[0], list):
list_type = 'list'
elif isinstance(args[0], np.ndarray):
list_type = 'np.ndarray'
elif isinstance(args[0], pd.Series):
list_type = 'pd.Series'
print('{} over {}, avg: {:.3f} sec, median: {:.3f} sec'.format(
func.__name__, list_type, np.mean(time_list), np.median(time_list)))
else:
print('{}, avg: {:.3f} sec, median: {:.3f} sec'.format(
func.__name__, np.mean(time_list), np.median(time_list)))
return wrapper

@timer
def map_function(l, f):
list(map(f, l))
@timer
def list_comprehension(l, f):
[f(x) for x in l]
@timer
def numpy_vectorize(l, f):
np.vectorize(f)(l)
@timer
def numpy_fromiter(l, f):
np.fromiter((f(x) for x in l), np.float32, count=len(l))
@timer
def numpy_frompyfunc(l, f):
np.frompyfunc(f, 1, 1)(l)
@timer
def pandas_series_map(s, f):
s.map(f)

map_function(l, f)
map_function(np_l, f)
map_function(s, f)
list_comprehension(l, f)
list_comprehension(np_l, f)
list_comprehension(s, f)
numpy_vectorize(l, f)
numpy_vectorize(np_l, f)
numpy_vectorize(s, f)
numpy_fromiter(l, f)
numpy_fromiter(np_l, f)
numpy_fromiter(s, f)
numpy_frompyfunc(l, f)
numpy_frompyfunc(np_l, f)
numpy_frompyfunc(s, f)
pandas_series_map(s, f)

# 結果

map_function       over list,       avg: 4.621 sec, median: 4.521 sec
map_function       over np.ndarray, avg: 5.245 sec, median: 5.244 sec
map_function       over pd.Series,  avg: 5.299 sec, median: 5.301 sec
list_comprehension over list,       avg: 4.810 sec, median: 4.795 sec
list_comprehension over np.ndarray, avg: 5.453 sec, median: 5.327 sec
list_comprehension over pd.Series,  avg: 5.430 sec, median: 5.397 sec
numpy_vectorize    over list,       avg: 7.127 sec, median: 7.074 sec
numpy_vectorize    over np.ndarray, avg: 4.001 sec, median: 3.976 sec
numpy_vectorize    over pd.Series,  avg: 3.897 sec, median: 3.882 sec
numpy_fromiter     over list,       avg: 5.738 sec, median: 5.733 sec
numpy_fromiter     over np.ndarray, avg: 6.578 sec, median: 6.541 sec
numpy_fromiter     over pd.Series,  avg: 6.485 sec, median: 6.491 sec
numpy_frompyfunc   over list,       avg: 4.153 sec, median: 4.143 sec
numpy_frompyfunc   over np.ndarray, avg: 3.186 sec, median: 3.144 sec
numpy_frompyfunc   over pd.Series,  avg: 3.109 sec, median: 3.104 sec
pandas_series_map,                  avg: 4.704 sec, median: 4.650 sec

numpy.ndarrayあるいはpandas.Seriesに対してnumpy.frompyfuncでufunc化した関数を適用するのが最速のようだ。

map関数やリスト内包表記はnumpy.ndarrayよりもlistに対してのほうがやや速い傾向にある？

31
Help us understand the problem. What is going on with this article?
Why not register and get more from Qiita?
1. We will deliver articles that match you
By following users and tags, you can catch up information on technical fields that you are interested in as a whole
2. you can read useful information later efficiently
By "stocking" the articles you like, you can search right away