More than 5 years have passed since last update.

型関数、Type families を調べ始める

Haskell

Last updated at 2015-04-19Posted at 2015-04-18

Yesod のガイドを読んでいたら、type family なるものに出くわした。

The basic idea of a type family is to state some association between two different types. Suppose we want to write a function that will safely take the first element of a list. But we don’t want it to work just on lists; we’d like it to treat a ByteString like a list of Word8s. To do so, we need to introduce some associated type to specify what the contents of a certain type are.

a type familiy の基本的な考え方は、２つの異なる型間の関連性を示すことである。問題なくリストの先頭の要素を取り出すような関数を定義したい。ただし、この関数では [] だけではなく ByteString のように Word8 の列となっているリストも扱いたい。そうするために、何がその型のコンテンツとなるのか（その文脈では何が列を示す型の要素となるのか )を指定する associated type を紹介する必要がある。

ちょっと意味がわからなかったので調べ始めたらまたしても深みにはまる。中途半端な状況だけど、ここまで調べた内容をメモしておく。

参考資料
https://wiki.haskell.org/GHC/Type_families
https://wiki.haskell.org/Simonpj/Talk:FunWithTypeFuns
http://research.microsoft.com/en-us/um/people/simonpj/papers/assoc-types/fun-with-type-funs/typefun.pdf

Fun with type functions

基本的に、次の論文を読みながら、引用・意訳とコメントを書くことにしています。
http://research.microsoft.com/en-us/um/people/simonpj/papers/assoc-types/fun-with-type-funs/typefun.pdf

Type Families とは

Indexed type families, or type families for short, are a Haskell extension supporting ad-hoc overloading of data types. Type families are parametric types that can be assigned specialized representations based on the type parameters they are instantiated with. They are the data type analogue of type classes: families are used to define overloaded data in the same way that classes are used to define overloaded functions. Type families are useful for generic programming, for creating highly parameterised library interfaces, and for creating interfaces with enhanced static information, much like dependent types.

type families は indexed type families の略で、臨時のデータ型オーバーロードをサポートするHaskell 拡張である。インスタンスとして生成された型変数に基づいて、特別な表現として割り当てられた変数の種類をtype families という。これは型クラスと類似している: クラスが関数をオーバーロードするように、type families はデータ型をオーバーロードする。( 後略 )

従来の実装における課題

複数パラメータを取る型クラスを使用すると、次の２点の欠点があるということだ。

強すぎるポリモーフィズム( too polymorphic )のため、意味をなさない組み合わせのインスタンス宣言ができてしまう

抽象的側面と実装の、流動的要素の組み合わせはプログラマの自由であるため、意味をなさない組み合わせを作ってしまうこともできる。

class Mutation m r where
    newRef      :: a -> m (r a)
    readRef     :: r a -> m a
    writeRef    :: r a -> a -> m ()

instance Mutation IO IORef where
    newRef = newIORef
    ...

instance Mutation (ST s) (STRef s) where
    newRef = newSTRef
    ...

ここで、もしOOPの世界なら、m と r はそれぞれ下図のMutation クラス、Ref クラスのようなイメージであると理解した。

抽象的な側面と実装とを分離して、必要に応じて組み合わせることは、OOPなら割と普通な設計方針だと思うが、Haskell における設計では、プログラマが引き起こしてしまう意味のない設計の可能性も排除したい、という意味で課題扱いしているのだろう。

静的型チェックを行う際の曖昧さを除去できない( ambiguity )

ある型クラスのインスタンスが複数あった場合、どのインスタンスの関数を使用するべきかコンパイラがわからないということか。

次のような型クラスと、そのインスタンスを考えてみる。

class Add a where
    add :: a -> a -> a

instance Add Int where
    add = (+)

instance Add Bool where
    add = \x y -> or [x, y]

addメソッドの実行結果

*Main> foldr add 0 [1, 2]

<interactive>:32:1:
    No instance for (Add a0) arising from a use of ‘it’
    The type variable ‘a0’ is ambiguous
    Note: there are several potential instances:
      instance Add Bool -- Defined at typefunction.hs:10:10
      instance Add Int -- Defined at typefunction.hs:7:10
    In the first argument of ‘print’, namely ‘it’
    In a stmt of an interactive GHCi command: print it

どのインスタンスの操作なのかを明示しないといけない。

型注釈が必要

*Main> foldr add 0 [1, 2] :: Int
3

ちょっと脱線。なぜか、Bool インスタンスだと実行できてしまう。この場合は型チェックを通るということか。

こっちはうまくいくのだが・・

*Main> foldr add False [True, False]
True

型を調べてみると、前者は具体型が定まっていないのでそれが原因だろうか。

*Main> :t foldr add 0 [1, 2]
foldr add 0 [1, 2] :: (Add b, Num b) => b
*Main> :t foldr add False [True, False]
foldr add False [True, False] :: Bool

Associated types

複数の型引数を取る型クラスがあってその型引数同士に関数関係があるならば、実質的にはその型クラスは型引数を２つ取る必要はない。一つ型が決まれば、他方の型も暗黙的に決定するからだ。Type Family はこのような時、次のように表現することができる。

論文内容をそのまま活用、型クラスの意味は割愛。type宣言に注目

class Mutation m where
    type Ref m :: * -> *
    newRef      :: m (Ref m a)
    readRef     :: Ref m a -> m a
    writeRef    :: Ref m a -> a -> m ()

instance Mutation IO where
    type Ref IO = IORef
    ...

instance Mutation (ST s) where
    type Ref (St s) = STRef s
    ...

このとき Ref を a type family または an associated type of the class Mutation と呼ぶ。（値レベルではなく）型レベルの関数のようなので、Ref は a type function だとも呼べる。

ここで、Ref m a は、型コンストラクタ適用と同じ構文で、次のような意味になる。

m に Ref を適用し、型コンストラクタ*->* を得る
この型コンストラクタを a に適用し、具体型を得る

上の例では m はIO や ST s といったモナドを指している。それぞれのインスタンス宣言ごとに、区別された Ref 型コンストラクタが作られるイメージだろうか。これにより、型引数が一つの型クラスで、暗黙的に二つの型を指定することができた 。

muliti-parameter-type-class

class Mutation m r where
    newRef      :: a -> m (r a)
    readRef     :: r a -> m a
    writeRef    :: r a -> a -> m ()

instance Mutation IO IORef where
    newRef = newIORef
    ...

instance Mutation (ST s) (STRef s) where
    newRef = newSTRef
    ...

上の定義でm と r には多対多の関係があり、（やろうと思えば）instance Mutation IO (STRef s) where ... といった意味をなさない組み合わせのインスタンスを作れてしまう。

Associated data types

associated types と比べて associated data types は、宣言した型が単射性を満たしていることを明示したことになる。つまり、C a ~ C b ならば a ~ b であることを暗に示すことになる。

プログラムの意味は省くが、次のような例がある。

class Graph g where
     type Vertex g
     data Edge g
     src, tgt :: Edge g -> Vertex g
     outEdges :: g -> Vertex g -> [Edge g]
     
newtype G1 = G1 [Edge G1]
instance Graph G1 where
   type Vertex G1 = Int
   data Edge   G1 = MkEdge1 (Vertex G1) (Vertex G1)
   -- ...definitions for methods...
newtype G2 = G2 (Map (Vertex G2) [Vertex G2])
instance Graph G2 where
   type Vertex G2 = String
   data Edge   G2 = MkEdge2 Int (Vertex G2) (Vertex G2)
   -- ...definitions for methods...

neighbours g v = Prelude.map tgt (outEdges g v)

省くと言っているものの実のところ、なぜそうなるのかわからない点があった。ご存知の方がいらしたらご教示くださいm(_ _)m
こちらの論文の2.4で、次のような説明があります。

We expect GHC to infer the following type:
neighbours :: Graph g => g -> Vertex g -> [Vertex g]
Certainly, outEdges returns a [Edge g1] (for some type g1), and tgt requires its argument to be of type Edge g2 (for some type g2). So, GHC’s type checker requires that Edge g1 ∼ Edge g2, where “∼” means type equality.

なぜ g1 と g2 という形で、インスタンスの型が異なるかもしれないことを表す必要があるのかわかりませんでした。g でいいのではないか、と。私が型チェックの内部の仕組みを学んでいないことが、疑問の根本的な原因なのでしょうか。

(2015.4.19 追記)
flexibleInstances 拡張の意味を追えば、解決の糸口をつかめそう

(2015.4.20 追記) クラスのインスタンスには、宣言のルールが存在している。これに何らかの影響を受けて、型変数は distinct type variables になるのだろうか。

7.6.3.2. Relaxed rules for the instance head

In Haskell 98 the head of an instance declaration must be of the form C (T a1 ... an), where C is the class, T is a data type constructor, and the a1 ... an are distinct type variables. In the case of multi-parameter type classes, this rule applies to each parameter of the instance head. (Arguably it should be OK if just one has this form and the others are type variables, but that's the rules at the moment.)

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up