Julia で「省略記法による無限リスト」を実装してみた話 #Julia

TL;DR

Julia で (1,..) とか (1,3,..) とか書いたら、1,2,3,4,5,… とか 1,3,5,7,9,… とかっていう無限リストが定義できるようにしてみた。

前置き

つい最近、EllipsisNotation.jl というパッケージが公開されました¹。

これを見て、Haskell で [1..] とか [1,3..] とかって書いたら簡単に無限列（等差数列）が定義できるから、似たようなこと Julia でできないかな。と思いまして。
工夫したらなんとかそれっぽいものができたのでコード晒します。

開発・確認環境

Julia v0.4.x / v0.5.0
- Windows: Julia v0.4.6 / v0.5.0-dev+5306
- Mac: Julia v0.4.6 / v0.5.0-dev+5349
- glot.io: Julia v0.4.5

具体的にやりたいこと

というか主な仕様を示します。

(1,..) と書くと²、1,2,3,4,5,… といった「1から始まって1ずつ増えていく無限数列（正整数列）」を定義したことになる。
(1,3,..) と書くと²、1,3,5,7,9,… といった「1から始まって2ずつ増えていく無限数列（正奇数列）」を定義したことになる。
定義した数列は、take(○, 10) とすることで最初の10件をリスト（1次元配列）として取り出せる（実際には collect(take(○, 10))）。
filter() 関数に渡したり、generator式（(〜 for 〜 in ○)）のイテレータ項（○の部分）に渡すことで、加工やフィルタリングができる。
もちろん for 〜 in ○ のイテレータ項（○）にも普通に渡せる。
途中で break したりしなければ、ずっと列挙し続ける。
あと整数だけでなく、他の数値型（例：(1.0,1.1,..)）や文字型（例：('A','C',..)）でも同様に動作する。

テストコードを示してみます。

まずは、v0.4.x/v0.5.0 共通。

dot_dot_list_test.jl

# dot_dot_list_test.jl

# include("./dot_dot_list.jl")
# using DotDotList
# using Base.Test

nats = (1,..)
# # io = IOBuffer()
# # print(io, nats)
# # @test "(1,..)" == takebuf_string(io)
# v- shorthand of -^
@test "(1,..)" == "$nats"
@test Int == eltype(nats)
@test [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] == collect(take(nats, 10))

odds = (1,3,..)
@test "(1,3,..)" == "$odds"
@test Int == eltype(odds)
@test [1, 3, 5, 7, 9, 11, 13, 15, 17 ,19] == collect(take(odds, 10))

issq(n::Int) = isqrt(n)^2 == n
@test [1, 9, 25, 49, 81, 121, 169, 225, 289, 361] == collect(Int, take(filter(issq, odds), 10))

floats = (1.5,..)
@test "(1.5,..)" == "$floats"
@test Float64 == eltype(floats)
@test Float64[1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5, 9.5, 10.5] == collect(take(floats, 10))
@test 60.0 == sum(take(floats, 10))

floats2 = (1.0,1.1,..)
@test "(1.0,1.1,..)" == "$floats2"
@test Float64 == eltype(floats2)
@test Float64[1.0, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2.0] == collect(take(floats2, 11))
@test 16.5 == sum(take(floats2, 11))

floats3 = (1.0,4.0/3.0,..)
# @test "(1.0,1.3333333333333333,..)" == "$floats3"
@test ismatch(r"\(1\.0,1\.33333+,\.\.\)", "$floats3")
@test Float64 == eltype(floats3)
@test Float64[1.0, 4.0/3.0, 5.0/3.0, 2.0, 7.0/3.0, 8.0/3.0, 3.0, 10.0/3.0, 11.0/3.0, 4.0] == collect(take(floats3, 10))
@test 25.0 == sum(take(floats3, 10))

rationals = (1//1,9//10,..)
@test "(1//1,9//10,..)" == "$rationals"
@test Rational{Int} == eltype(rationals)
@test Rational{Int}[1//1, 9//10, 4//5, 7//10, 3//5, 1//2, 2//5, 3//10, 1//5, 1//10] == collect(take(rationals, 10))

bools1 = (false,..)
@test "(false,..)" == "$bools1"
@test Bool == eltype(bools1)
@test Bool[false, true, false, true, false] == collect(take(bools1, 5))

bools2 = (true,false,..)
@test "(true,false,..)" == "$bools2"
@test Bool == eltype(bools2)
@test Bool[true, false, true, false, true] == collect(take(bools2, 5))

trues = (true,true,..)
@test "(true,true,..)" == "$trues"
@test Bool == eltype(trues)
@test Bool[true, true, true, true, true] == collect(take(trues, 5))

chars = ('A',..)
@test "('A',..)" == "$chars"
@test Char == eltype(chars)
@test Char['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J'] == collect(take(chars, 10))

odd_chars = ('A','C',..)
@test "('A','C',..)" == "$odd_chars"
@test Char == eltype(odd_chars)
@test Char['A', 'C', 'E', 'G', 'I', 'K', 'M', 'O', 'Q', 'S'] == collect(take(odd_chars, 10))

続いて、v0.5.0-dev で追加された generator記法に対応したテスト。

dot_dot_list_v5_test.jl

# dot_dot_list_v5_test.jl

# include("./dot_dot_list.jl")
# using DotDotList
# using Base.Test

evens = (2n for n=(1,..))
@test [2, 4, 6, 8, 10, 12, 14, 16, 18, 20] == collect(Int, take(evens, 10))

odd_squares = (n^2 for n=(1,3,..))
@test [1, 9, 25, 49, 81, 121, 169, 225, 289, 361] == collect(Int, take(odd_squares, 10))

powers = (2^n for n=(1,..))
@test [2, 4, 8, 16, 32, 64, 128, 256, 512, 1024] == collect(Int, take(powers, 10))

あとついでに、まとめてテストするための runtests.jl 。

runtests.jl

# runtests.jl

include("./dot_dot_list.jl")
using DotDotList
using Base.Test

tests = ["dot_dot_list_test"]
if VERSION >= v"0.5.0-dev+5239"
    push!(tests, "dot_dot_list_v5_test")
end

for t in tests
    println("testing $t ...")
    @time include("$t.jl")
end

実装

まずは成果物を示します。

dot_dot_list.jl

# dot_dot_list.jl
# (1,..) / (1,3,..) Notations for Infinite List
# Inspired by: EllipsisNotation.jl (https://github.com/ChrisRackauckas/EllipsisNotation.jl)

module DotDotList


import Base.eltype, Base.length, Base.start, Base.done, Base.next, Base.show

typealias DotDot Val{:..}
const .. = DotDot()

typealias DotDotInf1{T} Tuple{T,DotDot}
typealias DotDotInf2{T} Tuple{T,T,DotDot}
typealias DotDotInf{T} Union{DotDotInf1{T}, DotDotInf2{T}}

@inline eltype{T}(::DotDotInf{T}) = T
@inline eltype{T}(::Type{DotDotInf1{T}}) = T
@inline eltype{T}(::Type{DotDotInf2{T}}) = T

@inline length{T}(it::DotDotInf{T}) = typemax(Int)
if VERSION >= v"0.5.0-dev+5239"
    @inline Base.iteratorsize{T}(::DotDotInf{T}) = Base.IsInfinite()
end

@inline start{T<:Number}(it::DotDotInf1{T}) = (it[1], one(T))
@inline start(it::DotDotInf1{Char}) = (it[1], 1)
@inline start(it::DotDotInf1{Bool}) = (it[1], 1)
@inline start{T}(it::DotDotInf2{T}) = (it[1], it[2]-it[1])

@inline done(::DotDotInf{Int}, ::NTuple{2,Int}) = false
@inline done{T<:Number}(::DotDotInf{T}, ::NTuple{2,T}) = false
@inline done{T}(::DotDotInf{T}, ::Tuple{T,Int}) = false

@inline next{T<:Number}(::DotDotInf{T}, t::NTuple{2,T}) = ((n,s)=t;(n,(n+s,s)))
@inline next(::DotDotInf{Char}, t::Tuple{Char,Int}) = ((n,s)=t;(n,(n+s,s)))
@inline next(::DotDotInf{Bool}, t::Tuple{Bool,Int}) = ((n,s)=t;(n,((n+s)%Bool,s)))

@inline function show{T}(io::IO, it::DotDotInf1{T})
    print(io, '(')
    show(io, it[1])
    print(io, ",..)")
end
@inline function show{T}(io::IO, it::DotDotInf2{T})
    print(io, '(')
    show(io, it[1])
    print(io, ',')
    show(io, it[2])
    print(io, ",..)")
end

# Special for DotDotInf2{T<:AbstractFloat}
function start{T<:AbstractFloat}(it::DotDotInf2{T})
    if _seemsrational(it[1]) && _seemsrational(it[2])
        s = rationalize(it[1])
        t = rationalize(it[2]) - s
        d = convert(T, lcm(den(s), den(t)))
        (s*d, t*d, d)
    else
        (it[1], it[2] - it[1])
    end
end
@inline done{T<:AbstractFloat}(::DotDotInf{T}, ::NTuple{3,T}) = false
@inline next{T<:AbstractFloat}(::DotDotInf{T}, t::NTuple{3,T}) = ((n,s,d)=t;(n/d,(n+s,s,d)))
@inline _seemsrational{T<:AbstractFloat}(v::T) = convert(T, rationalize(v)) == v

export .., eltype, length, start, done, next, show


end # module

解説

eltype / start / done / next 等は、イテレーションの定義。私の過去記事等を参照してください。

一番のポイントは、(x,..) というのが Julia では文法的に正しく、普通に Tuple として認識されるので、それをそのまま利用していると言うこと。.. というのも識別子として有効であり、それに特別な型と値を与えることで、解釈時も実行時も有効³というわけです。

今回は const .. = DotDot() としています（さらに DotDot は Val{:..} という Value Type の型エイリアスです）。こうすることで typeof((x,..)) == Tuple{T,DotDot}、isa(typeof((1,..)), Type{Tuple{T,DotDot}}) == true という具合に Julia の型システムにもキレイにマッチします⁴⁵。

その他、ハマった箇所を少しだけ詳しく解説。

21-24行目。無限リストであることを表すために length 関数と（Julia v0.5 用に）Base.iteratorsize() 関数を多重定義。
- v0.5 では色々とパフォーマンス改善の工夫が入れられているのですが、その1つとして Base.iteratorsize() という関数（export されてないので　Base. 必須）が適切に多重定義されていなかったら有限列（特に length() 関数が多重定義済）と見做して全要素を先行取得しようとする（＝遅延リストでなくなる）ようになっている模様です。そこでエラーになったり無限ループみたいになったり大変でした。個人的に新しい無限リスト（遅延リスト）を定義しようとする人って少ないのかなぁ。
- あと length() の実装の方は、取り敢えず Int の最大値を返しておく、というやっつけ仮実装です。再定義しないと Tuple の length を返してしまう（＝2または3が返ってしまう）ので。あと v0.5 ではそのせいで意図しないエラーが起きたりした（時には segmentation fault で落ちたりもした）ので（めんどくさい）。
38-49行目。show() の多重定義。
- ハマりポイントその2。やはり v0.5 で、これをしないと REPL で確認しようとしたときとかに要素を全取得しに走って無限ループっぽくなって焦った（Tupleに対するshow()がおそらくそのように基本実装されている模様）。
52-64行目。AbstractFloat（Float64, Float32 とかの共通 supertype）に対する特別な列挙の定義。これのおかげで (1.0,1.1,..)とかしても途中の結果が 2.000000000000001みたいにおかしなことにならずにまともに列挙できます！
- Haskell だと [1.0,1.1..] は浮動小数点誤差の関係で上述のようにおかしなことになります。「浮動小数点には使わないことをオススメします」と某すごいH本（の原本？）には書いてあります確か。

最後に

取り敢えず普通に使えるレベルにはなったと思います。
パッケージとして公開するかどうかは別問題。名前とか見直して、あと需要がありそうなら。

A[n, ..] のように記述することで、多次元配列で指定の必要のないインデックスを省略できる（A[n, :, :, :] 等と同じ意味になる）というもの。 ↩
ちなみにこの記法にした理由は、1. 求める形に近くてかつJuliaの文法上認められていること、2. Juliaの型システムにうまくマッチし、他の拡張が不要なこと、の2点からです。あえて加えるなら、3.Python 由来で [～for～in～] がリスト（配列、特に有限）、(～for～in～) が generator（遅延リスト、無限対応可能）なので、[x,y,..] よりも視覚的にもマッチするかな、と。 ↩
EllipsisNotation.jl もそれを利用しています。ただし .. の型が異なるので、今回作った DotDotList とは互換性がありません。 ↩
EllipsisNotation.jl では const .. = Val{:..} となっていますが、それだと typeof((x,..)) == Tuple{T,DataType}、isa((x,..), Tuple{T,Type{Val{:..}}}) == false となってしまいうまく行きません。 ↩
実はもう1つ、const .. = ValType{:..}() としたことで、isbist(..) == true かつ sizeof(..) == 0 になり、将来的にいろいろ有効にもなったりしました。例えば reinterpret() 関数に噛ますことができるとか。 ↩