More than 5 years have passed since last update.

関数型プログラミングライブラリtoolzを使ってみた

Last updated at 2015-02-02Posted at 2015-02-02

標準ライブラリのitertools, functools を拡張したライブラリであるtoolzを使ってみました。

インストール

pipでインストールできます。

$ pip install toolz

より高速に動作するCython版もあります。

$ pip install cytoolz

使い方

toolz により提供される関数は次の3つに大別されます。このうちItertoolz 、Functoolz は、それぞれitertools 、functools の拡張に相当する機能を提供します。

Itertoolz
Functoolz
Dicttoolz

toolzはmap　、reduce 、filter など標準で使用できる関数も提供しています。これらをインポートするとmap であればitertools.imap のようにIterableを扱える関数に置き換えられます。これらの関数の使い方については元のものとほぼ同じなので割愛します。

以下、よく使いそうな関数を紹介していきます。

Itertoolz

itertools 相当の機能を提供します。itertoolsのレシピに載っているような関数もあります。

要素の取得 - get, pluck

get

get はシークエンスや辞書から要素を取得する関数です。

インデックスを指定することでシークエンスから要素を取得することができます。

>>> from toolz import get
>>> get(1, range(5))
1

キーを渡して辞書から値を取得することもできます。

>>> get('a', {'a': 'A', 'b': 'B', 'c': 'C'})
'A'

インデックス外を指定した場合やキーが存在しなかった場合のデフォルト値を指定することが可能です。

>>> get(10, range(5), 0)
0
>>> get('d', {'a': 'A', 'b': 'B', 'c': 'C'}, 'None')
'None'

インデックスやキーをリストで渡すと複数の値を取得することができます。

>>> get([1, 3, 5], range(5), 0)
(1, 3, 0)
>>> get(['b', 'd', 'a'], {'a': 'A', 'b': 'B', 'c': 'C'}, 'None')
('B', 'None', 'A')

pluck

pluck は、get をmap するのに相当する結果を返します。

>>> from toolz import pluck
>>> mat = [[(i, j) for i in range(5)] for j in range(5)]
>>> for r in mat:
...     print r
... 
[(0, 0), (1, 0), (2, 0), (3, 0), (4, 0)]
[(0, 1), (1, 1), (2, 1), (3, 1), (4, 1)]
[(0, 2), (1, 2), (2, 2), (3, 2), (4, 2)]
[(0, 3), (1, 3), (2, 3), (3, 3), (4, 3)]
[(0, 4), (1, 4), (2, 4), (3, 4), (4, 4)]
>>> for r in pluck([2, 4], mat):
...     print r
... 
((2, 0), (4, 0))
((2, 1), (4, 1))
((2, 2), (4, 2))
((2, 3), (4, 3))
((2, 4), (4, 4))

累積の計算 - accumulate

accumulate はreduce と似ていますが、累積を返すイテレータを返します。
Python3.2以降ではitertools に実装されています。

>>> from toolz import accumulate
>>> from operator import add
>>> list(accumulate(add, range(5)))
[0, 1, 3, 6, 10]
>>> xs = [randint(1, 10) for n in range(10)]
>>> xs
[7, 3, 4, 2, 9, 4, 1, 10, 8, 1]
>>> list(accumulate(max, xs))
[7, 7, 7, 7, 9, 9, 9, 10, 10, 10]

グルーピング - groupby, countby, reduceby

groupby は、キー関数の値でシークエンスの要素をグルーピングします。
countby は、各グループの要素数をカウントします。
reduceby は、各グループでreduce を実行したのに相当する結果を返します。

>>> from toolz import groupby, countby, reduceby
>>> from operator import add
>>> xs = range(10)
>>> is_even = lambda n: n % 2 == 0
>>> groupby(is_even, xs)
{False: [1, 3, 5, 7, 9], True: [0, 2, 4, 6, 8]}
>>> countby(is_even, xs)
{False: 5, True: 5}
>>> reduceby(is_even, add, xs)
{False: 25, True: 20}

itertools.groupby が連続した要素をグルーピングするのに対して、toolz.groupby は要素の並びに関係なくグルーピングします。

>>> import toolz as tz
>>> import itertools as it
>>> xs = range(10)
>>> is_even = lambda n: n % 2 == 0
>>> tz.groupby(is_even, xs)
{False: [1, 3, 5, 7, 9], True: [0, 2, 4, 6, 8]}
>>> [(k, list(g)) for k, g in it.groupby(xs, is_even)]
[(True, [0]), (False, [1]), (True, [2]), (False, [3]), (True, [4]), (False, [5]), (True, [6]), (False, [7]), (True, [8]), (False, [9])]

シークエンスへの要素の追加・連結 - cons, concat, concatv

cons はシークエンスの先頭に要素を追加します。
concat はシークエンスを連結します。
concatv はconcatv を可変長引数をとるようにしたものです。

>>> from toolz import cons, concat, concatv
>>> xs = range(10)
>>> cons(-1, xs)
>>> list(concat([xs, xs]))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(concatv(xs, xs))
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

シークエンスの分割 - partition, partition_all, partitionby

partition 、partition_all はシークエンスを指定された長さのタプルに分割します。余りがでた際の動作が異なります。

partition のパディングを指定しない。余った分は出力されない。
partition のパディングを指定する。余りも出力する。値がないところを指定された値でパディング。
partition_all は、最後の要素が短くなる場合がある。

partitionby は、指定された関数でシークエンスを分割します。

>>> from toolz import partition, partition_all, partition by
>>> xs = range(10)
>>> list(partition(3, xs))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
>>> list(partition(3, xs, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
>>> list(partition_all(3, xs))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9,)]
>>> list(partitionby(lambda x: x < 5, xs))
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9)]

sliding_window - スライディングウィンドウ

sliding_window は、インデックスを1つずつずらしながら指定した長さのタプルを出力します。

>>> from toolz import sliding_sindow
>>> list(sliding_window(3, range(10)))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9)]

複数のソートされたシークエンスをマージ - merge_sorted

merge_sorted は、複数のソートされたシークエンスを引数にとりマージして出力します。

>>> from toolz import merge_sorted
>>> from random import randint
>>> xs = sorted([randint(1, 10) for _ in range(5)])
>>> ys = sorted([randint(1, 10) for _ in range(5)])
>>> zs = sorted([randint(1, 10) for _ in range(5)])
>>> xs
[3, 6, 6, 6, 9]
>>> ys
[3, 4, 5, 7, 8]
>>> zs
[1, 2, 4, 5, 8]
>>> list(merge_sorted(xs, ys, zs))
[1, 2, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 8, 8, 9]

入力されるシークエンスがソートされていない場合、結果は順番に並ばないので注意。

>>> from toolz import merge_sorted
>>> from random import randint
>>> xs = [randint(1, 10) for _ in range(5)]
>>> ys = [randint(1, 10) for _ in range(5)]
>>> zs = [randint(1, 10) for _ in range(5)]
>>> xs
[4, 3, 10, 7, 3]
>>> ys
[7, 8, 1, 10, 2]
>>> zs
[2, 6, 2, 4, 10]
>>> list(merge_sorted(xs, ys, zs))
[2, 4, 3, 6, 2, 4, 7, 8, 1, 10, 7, 3, 10, 2, 10]

入力がソートされていない場合には、連結してソートすれば順番にならんだ結果が得られる。

>>> from toolz import concatv
>>> sorted(concatv(xs, ys, zs))
[1, 2, 2, 2, 3, 3, 4, 4, 6, 7, 7, 8, 10, 10, 10]

join - シークエンスの結合

キー関数の値で2つのシークエンスを結合する。

>>> from toolz import join
>>> from toolz.curried import get # カリー化されたgetをインポート
>>> carts = [('Taro', 'Apple'), ('Taro', 'Banana'), ('Jiro', 'Apple'), ('Jiro', 'Orange'), ('Sabu', 'Banana'), ('Sabu', 'Banana')]
>>> prices = [('Apple', 100), ('Banana', 80), ('Orange', 150)]
>>> for x in join(get(1), carts, get(0), prices):
...     print x
... 
(('Taro', 'Apple'), ('Apple', 100))
(('Jiro', 'Apple'), ('Apple', 100))
(('Taro', 'Banana'), ('Banana', 80))
(('Sabu', 'Banana'), ('Banana', 80))
(('Sabu', 'Banana'), ('Banana', 80))
(('Jiro', 'Orange'), ('Orange', 150))

Functoolz

カリー化 - curry

curry

curry 関数を使ってカリー化を行うことができます。

>>> from tools import curry
>>> from operator import add
>>> curried_add = curry(add)
>>> curried_add(3)(4)
7

curry はデコレータとして使用することもできます。

>>> from tools import curry
>>> @curry
... def add(a, b):
...     return a+b
... 
>>> add(3)(4)
7

toolzが提供する関数のカリー化

toolz が提供している関数についてはtoolz.curried からインポートするとカリー化されたバージョンを取得できます。

例としてmap 関数の場合を見てみます。
toolz からmap をインストールし関数のみを渡すと引数が足りないというエラーがでてしまいます。

>>> from toolz import map as not_curried_map
>>> list(not_curried_map(lambda x: x + 1)([1, 2, 3]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: imap() must have at least two arguments.

curry を使ってカリー化すると次のように実行できるようになります。

>>> list(curry(not_curried_map)(lambda x: x + 1)([1, 2, 3]))
[2, 3, 4]

toolz.curried からインポートするとカリー化されているためそのままで実行できます。

>>> from toolz.curried import map as curried_map
>>> list(curried_map(lambda x: x + 1)([1, 2, 3]))
[2, 3, 4]

関数合成 - compose, pipe, thread_first, thread_last

compose

compose を使って複数の関数を合成することができます。

compose(f, g, h)(x)は f(g(h(x))) と同じです。

>>> from toolz import compose, curry
>>> from operators import add, mul
>>> compose(curry(mul)(2), curry(add)(1))(3)
8

pipe, thread_first, thread_last

pipe もcompose と同じようにデータに複数の関数を適用することができますが、
composeと引数の順番が逆になっています。

シェルのパイプのようにデータの流れと同じく左から右へと評価されていきます。
pipe(x, f, g, h) は h(g(f(x))) となります。

>>> from toolz import pipe
>>> from toolz.curried import get
>>> pipe('hello world', str.split, get(0), str.upper)
'HELLO'

thread_first 、thread_last は1引数の関数が与えられた場合にはpipe と同様の動きをします。

>>> from toolz import thread_first, thread_last
>>> from toolz.curried import get
>>> thread_first('hello world', str.split, get(0), str.upper)
'HELLO'
>>> thread_last('hello world', str.split, get(0), str.upper)
'HELLO'

2つ以上の引数をとる関数をタプルを使って渡すことができ、その際の動作が異なります。
thread_first の場合には、前の関数から渡される結果が最初の引数となり、
thread_last の場合には最後の引数となります。

>>> thread_first('hello world', str.split, get(0), str.upper, (add, 'WORLD'))
'HELLOWORLD'
>>> thread_last('hello world', str.split, get(0), str.upper, (add, 'WORLD'))
'WORLDHELLO'

メモ化 - memoize

memoize を使うとメモ化を行うことができます。
memoize はデコレータとしても使用できます。

>>> def tarai(x, y, z):
...     if x <= y:
...         return y
...     return tarai(tarai(x-1, y, z), tarai(y-1, z, x), tarai(z-1, x, y))
... 
>>> tarai(12, 6, 0)
12
>>> t = memoize(tarai)
>>> t(12, 6, 0)
12

同じ引数に複数の関数を適用 - juxt

>>> from toolz import juxt
>>> from operator import add, mul
>>> juxt(add, mul)(3, 4)
(7, 12)

恒等関数 - identity

identity は引数をそのまま返します。

>>> from toolz import identity
>>> identity(3)
3

副作用による処理 - do

do は関数を実行し引数を返します。
関数の実行結果は捨てられるので副作用でログを出力するなどの用途で使用します。

以下の例ではlog に引数を追加しています。

>>> from toolz import compose
>>> from toolz.curried import do
>>> log = []
>>> map(compose(str, do(log.append)), range(5))
['0', '1', '2', '3', '4']
>>> log
[0, 1, 2, 3, 4]

Dicttoolz

ネストされた辞書の参照・更新 - get_in, update_in

get_in を利用すると引数にキーのリストを渡すことでネストされた辞書を簡単に参照することができます。デフォルト値の指定も可能です。

>>> from toolz import get_in
>>> d = {"a": {"b": {"c": 1}}}
>>> d
{'a': {'b': {'c': 1}}}
>>> get_in(["a", "b", "c"], d)
1
>>> get_in(["a", "b", "e"], d, 'None')
'None'

update_in を利用すると引数にキーのリストを渡すことでネストされた辞書を簡単に更新することができます。

更新は更新用の関数を渡すことで行います。
元の辞書に変更は加えられず関数を適用して更新された辞書を返します。

>>> from toolz import update_in
>>> d = {"a": {"b": {"c": 1}}}
>>> update_in(d, ["a", "b", "c"], lambda x: x+1)
{'a': {'b': {'c': 2}}}
>>> d
{'a': {'b': {'c': 1}}}

キーが存在しない場合にはデフォルト値を使って新しく作成されます。

>>> update_in(d, ["a", "b", "e"], lambda x: x+1, 0)
{'a': {'b': {'c': 1, 'e': 1}}}

最後に

多くの機能はitertools 、functools を使って実装されているため、それらのモジュールの使い方としても参考になると重います。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up