More than 3 years have passed since last update.

Pythonの高階関数と内包表記

Last updated at 2020-08-22Posted at 2020-08-02

質問にした方が良いのかもしれないが，必要に応じて追記していきたいのと，質問だと何かに負ける気がするので記事にしてみる（自己中）．

もともとScheme使いだが，諸般の理由で代替言語としてのPythonを使用している都合上，Schemeでよく表現していた高階関数の記述をPythonに置き換える方法を模索している．Schemeではとにかくmap＋lambda等を多用し，Pythonでもそのまま書けることは書けるが，やはり微妙に異なるところがあるのと，可読性・処理速度の観点でリスト内包表記等の方が推奨されているようなので，内包表記（comprehension）で表現できるものはしていきたいと考えている．

この記事では，無理矢理（？）内包表記で表現した記述を晒して，備忘録と情報交換用の資料とする．なお，Schemeの処理系はGauche，Pythonの処理系はPython3を想定している．

#記述例（１）（map→内包表記）
Schemeでの記述例：

(display (map + '(0 1 2 3 4) '(5 6 7 8 9)))
; => (5 7 9 11 13)

Pythonのmapやlambdaを用いた記述例：
※+を直接渡せないのでこうなった．

from operator import add
print(tuple(map(add, (0,1,2,3,4), (5,6,7,8,9))))
# => (5, 7, 9, 11, 13)

または

print(tuple(map((lambda x, y: x + y), (0,1,2,3,4), (5,6,7,8,9))))
# => (5, 7, 9, 11, 13)

Pythonの内包表記を用いた記述例：

print(tuple(x + y for x, y in zip((0,1,2,3,4), (5,6,7,8,9))))
# => (5, 7, 9, 11, 13)

処理速度は…あれ，map+addが一番速い？何か間違えたかな（弱気）．

from time import time
from operator import add

R1 = range(9999999,-1,-1)
R2 = range(0,10000000)

start = time()
a = tuple(map(add, R1, R2))
print("          map + add: ", time() - start)

start = time()
b = tuple(map((lambda x,y: x + y), R1, R2))
print("       map + lambda: ", time() - start)

start = time()
c = tuple(x + y for x,y in zip(R1, R2))
print("comprehension + zip: ", time() - start)

#           map + add:  2.8753578662872314
#        map + lambda:  4.222743034362793
# comprehension + zip:  3.9112401008605957

#記述例（２）（fold/reduce）
Schemeでの記述例：

; fold-right in SRFI 1
(display (fold-right * 1 '(1 2 3 4 5 6 7 8 9 10)))
; => 3628800

Pythonのreduceやlambdaを用いた記述例：
※*を直接渡せないので以下略．

from functools import reduce
from operator import mul
print(reduce(mul, (1,2,3,4,5,6,7,8,9,10)))
# => 3628800

または

print(reduce(lambda x, y: x * y, (1,2,3,4,5,6,7,8,9,10)))
# => 3628800

Pythonの内包表記を用いた記述例…って，あるのかな？そもそも，求める値がシーケンス型ではないので，集合内包も何もないという話が（早速記事の主旨から外れる）．足し算なら標準でsumが用意されているみたいだけど．これはもう，必要なら自分で関数作ればってことか（むしろLISP的発想）．

def product(l):
    r = 1
    for i in l: r *= i
    return r

print(product((1,2,3,4,5,6,7,8,9,10)))
# => 3628800

そして時間計測．あまり変わらないけど，operator.mulがいらない子されかけているような．

from functools import reduce
from time import time
from operator import mul

def product(l):
    r = 1
    for i in l: r *= i
    return r

R1 = range(1,100000)

start = time()
a = reduce(mul, R1)
print("   reduce + mul: ", time() - start)

start = time()
a = reduce(lambda x, y: x * y, R1)
print("reduce + lambda: ", time() - start)

start = time()
a = product(R1)
print("        product: ", time() - start)

#    reduce + mul:  23.096322059631348
# reduce + lambda:  18.508586406707764
#         product:  19.82227087020874

#記述例（３）（内包表記→高階関数）
ここでは逆に，Pythonのリスト内包表記で表現された次の記述を，高階関数で表現することを考える．

tuple((x, y) for x in (1,2,3) for y in (7,8,9) if x + y < 11)
# => ((1, 7), (1, 8), (1, 9), (2, 7), (2, 8), (3, 7))

Schemeについては，拡張提案仕様SRFI 42にリスト内包表記記法list-ceがあり，次のように書ける．

(use srfi-42)
(display (list-ec (: x '(1 2 3)) (: y '(7 8 9)) (if (< (+ x y) 11)) (list x y)))
; => ((1 7) (1 8) (1 9) (2 7) (2 8) (3 7))

これを，lambda式やmap等だけで行おうとすると，割と複雑になる．

(display
  (filter (lambda (p) (< (+ (car p) (cadr p)) 11))
          (fold append '()
                (reverse
                  (map (lambda (x)
                         (map (lambda (y) (list x y))
                              '(7 8 9)))
                       '(1 2 3))))))
; => ((1 7) (1 8) (1 9) (2 7) (2 8) (3 7))

主な理由は，条件式による選別が，一度条件なしリストを生成してから別途filterで行うため．加えて，mapをネストすると戻るリストもネストするため，appendをfold（reduce）して平坦化する必要がある（しかも，appendはリストの先頭に順次追加していくため，appendする前にreverseでリスト要素を逆順にする必要がある）．後者については，SRFI 1のappend-mapが使えれば，少し簡単になる．

(display
  (filter (lambda (p) (< (+ (car p) (cadr p)) 11))
          (append-map (lambda (x)
                        (map (lambda (y) (list x y))
                             '(7 8 9)))
                      '(1 2 3))))
; => ((1 7) (1 8) (1 9) (2 7) (2 8) (3 7))

そういう意味では，もしリスト内包表記が使えなければ，map＋fold（reduce）＋filterがセットで必要になることになり（appendとreverseは共にリスト操作の基本と考える），手続き型が主の言語で関数型プログラミングが取り入れられた際に，これらが代表的な高階関数とみなされていったのかもしれない．そして，このような記述なら集合内包の方がわかりやすいとなって以下略．

なお，Python3ではreduceが外部ライブラリfunctoolsのひとつとなったが，基本関数sumがリストにも対応しているため，reduceがなくても次のように記述することが可能．

print(
  tuple(filter(lambda p: p[0] + p[1] < 11,
               sum(tuple(map(lambda x:
                             tuple(map(lambda y: (x, y),
                                       (7,8,9))),
                             (1,2,3))),
                   ()))))
# => ((1, 7), (1, 8), (1, 9), (2, 7), (2, 8), (3, 7))

functoolsのreduceを使用した場合は次の通り．

from functools import reduce
print(
  tuple(filter(lambda p: p[0] + p[1] < 11,
               reduce(lambda x, y: x + y,
                      tuple(map(lambda x:
                                tuple(map(lambda y: (x, y),
                                          (7,8,9))),
                                (1,2,3))),
                      ()))))
# => ((1, 7), (1, 8), (1, 9), (2, 7), (2, 8), (3, 7))

処理速度は比べるまでもなく．mapの方は途中で何度もtuple生成してるしなあ．

from time import time
from functools import reduce

N = 2000
R1 = range(1,N)

start = time()
a = tuple((x, y) for x in R1 for y in R1 if x + y < N)
print("        comprehension: ", time() - start)

start = time()
a = tuple(filter(lambda p: p[0] + p[1] < N,
               sum(tuple(map(lambda x:
                             tuple(map(lambda y: (x, y),
                                       R1)),
                             R1)),
                   ())))
print("   map + sum + filter: ", time() - start)

start = time()
a = tuple(filter(lambda p: p[0] + p[1] < N,
                 reduce(lambda x, y: x + y,
                        tuple(map(lambda x:
                                  tuple(map(lambda y: (x, y),
                                            R1)),
                                  R1)),
                        ())))
print("map + reduce + filter: ", time() - start)

#         comprehension:  0.5814449787139893
#    map + sum + filter:  27.217236757278442
# map + reduce + filter:  28.032208919525146

#記述例（４）（クイックソート）
結論から言うと，記述的にはリスト内包表記でもfilter＋lambdaでもあまり差がないような．組込実装されているがゆえに処理が速いという意味では，やはりリスト内包表記の方がずっといい感じ．

from time import time

D = range(996, -1, -1)

start = time()
def qsort1(L):
    if not L: return []
    else:
        LL = qsort1([x for x in L[1:] if x <  L[0]])
        LR = qsort1([x for x in L[1:] if x >= L[0]])
        return LL + [L[0]] + LR
a = list(qsort1(D))
print("  comprehension: ", time() - start)

start = time()
def qsort2(L):
    if not L: return []
    else:
        LL = qsort2(list(filter(lambda x: x <  L[0], L[1:])))
        LR = qsort2(list(filter(lambda x: x >= L[0], L[1:])))
        return LL + [L[0]] + LR
a = list(qsort2(D))
print("filter + lambda: ", time() - start)

#   comprehension:  0.1841585636138916
# filter + lambda:  0.3261446952819824

#記述例（５）（fold/reduceその2）
Schemeの記述例：

; fold-right in SRFI 1
(display (fold-right append '() '((1 2) (3 4 5) (6))))
=> (1 2 3 4 5 6)

Pythonのreduceを用いた記述例：
※+を直接渡せないので以下略．

from functools import reduce
from operator import add
print(reduce(add, ((1,2),(3,4,5),(6,)), ()))
=> (1, 2, 3, 4, 5, 6)

Pythonの内包表記を用いた記述例…やっぱりないな．平坦化については，sumを用いるか，itertoolsモジュールのitertools.chain.from_iterableを用いるのが一般的の模様．

sum(((1,2),(3,4,5),(6,)), ())
=> (1, 2, 3, 4, 5, 6)

from itertools import chain as ch
tuple(ch.from_iterable(((1,2),(3,4,5),(6,))))
=> (1, 2, 3, 4, 5, 6)

#備考

##更新履歴

2020-08-22：記述例（５）（fold/reduceその2）追加
2020-08-11：記述例（４）（クイックソート）追加
2020-08-11：記述例（３）（内包表記→高階関数）追加
2020-08-09：記述例（２）追記
2020-08-03：記述例（２）（map→内包表記）追加
2020-08-03：記述例（１）の処理時間計測を追加
2020-08-03：初版公開（記述例は（１）のみ）

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up