前回の配列の次はようやくコレクションです

コレクションの全体像

コレクションの全体像は大体、この感じ。これ以外にも、IReadOnlyCollection 等のリードオンリー系、ジェネリクスのサポートされているものと、そうでないもの、それから、スレッドセーフなコレクションも存在します。全体のざっくりした説明をするとこんな感じ

IEnumerable

単純な、繰り返しの Enumerator を返すことのできるインターフェイス

GetEnumerator()

が唯一のメソッド。

ICollection

サイズをもっている。非同期実行系の拡張メソッドもサポートしている。IEnumerable との違いは、 Count を持っていること。同期実行可能かのフラグをもっていたりする。

IDictionary

key/value ペアのコレクション

IList

順序性のあるコレクションで、インデックスをもっている

あとは、これらのインターフェイスに対して様々な実装クラスがサポートされています。が、原則を理解するためにはこれで十分でしょう。詳しい実装くらすはこちらがまとまっています。

コレクション概要 ++C++; // 未確認飛行 C

さて、コレクションの設計方針の解読にすすみます。

コレクションの設計方針

ちなみに、私は、.NET のクラスライブラリ設計という本を読んでいるが、これは現在のバージョンが Web にあるみたい。ただし本みたいに丁寧ではない。

Framework Design Guidelines Guidelines for Collections

DO NOT use weakly typed collections in public APIs

ジェネリクスがあるので、型指定しましょうはわかるのだが、下のような例が載っていたのだがそれが理解できなかった。なんで、Collection はインターフェイスではないのだろう？

public IList Foo { get ... }                       // Bad
public Collection<SomeClass> Bar { get...}         // Good

ここでは、ジェネリクスで型指定をすると解釈しよう。

DO NOT use ArrayList or List in public API

これは、内部向けの実装向けであり、public API 用ではない。これを返したら、中身が改ざんされてもわからない。さらに、沢山のメンバーを公開してしまうから。　先ほどから調べて一つ分かったことがある。ICollection インターフェイスは、Add メソッドなどを公開していない。Count と、GetEnumerator ぐらいだ。一方、IList は、更新用のメソッドも公開されている。

DO NOT use HashTable or Dictionary in public APIs

これも、内部実装用。IDictionary か、IDictionary を使う。そらインターフェイスにするよね！これは納得。

DO NOT use IEnumerator, IEnumerator, or any other type that implements either of these interfaces, except as the return type of a `GetEnumerator` method

これは、IEnumerable<T> とか、ICollection 返しましょうという話。Linq が実装されてから、IEnumerator を返す人が増えてるが、間違っているとのこと。IEnumerator は、繰り返しの実装なので、戻り値として返すのはおかしいし、何より、GetEnumerator() メソッドが実装されていないと、foreach が使えない。ちなみに、Linq のメソッドは、拡張メソッドとして定義されている。戻り値と、パラメータが、IEnumerable になっている。

public static IEnumerable<TResult> Select<TSource, TResult>(
    this IEnumerable<TSource> source,
    Func<TSource, TResult> selector
)

DO NOT implement both `IEnumerator<T>` and `IEnumerable<T> on the same type. The same applies to the nongeneric interfaces`IEnumerator`and`IEnumerable`

そらそうだ。IEnumerator は、 MoveNext, Current というインターフェイスをもった繰り返しのためのもので、コレクションを保持しているものと責務が違うよね。Single Responsiblity Pricinple というのが、OOの、原則にあるので、当然感ありあり。

コレクションパラーメータ

DO use the least-specialized type possible as a parameter type. Most member taking collections as parameter use the `IEnumerable<T>` interface

引数で、コレクションを使うときは、できるだけ、抽象度の高いものにしましょう。大抵は IEnumerable<T> をつかうとのこと。

public void DumpIt(IEnumerable<T> context)

とかかな。

AVOID using ICollection or ICollection as a parameter just to access the `Count` property

Countプロパティを使うためだけに、ICollection を使わないようにする。ICollection はサイズがあるから、確かにカウントがとれる。どうするか？というと、こんな風にする感じ。

        public void DumpCountAndContents<T>(IEnumerable<T> items)
        {
            ICollection<T> col = items as ICollection<T>;
            if (col != null)
            {
                Console.WriteLine($"Count: {items.Count()}");
            }
            foreach(var item in items)
            {
                Console.WriteLine(item);
            }
        }

コレクションの値と、戻り値

**DO NOT* provide settable collection properties

コレクションを、セッターでセットできるようにしてはいけないという話。ユーザがコレクションの中身を変えられるから。もし、すべての値を入れ替えるシナリオがある場合は、AddRange メソッドをコレクションで提供することを考える。本のサンプルはこれに似た感じ。基本、コレクションは、読み取りのみ。 ICollectionインターフェイスは更新できないようにする。コレクション自体が更新できなくても、中身の変更はできちゃうので、それができないようにするのがよいと思われる。AddRange は、コレクションの最後に足すのみ。

public ICollection<Item> Items { get; private set;}

AddRange メソッドの例が想像ついていないので試してみる。

            var list = new List<int> { 1, 2, 3 };
            list.AddRange(new List<int> { 4, 5 });
            foreach(var item in list)
            {
                Console.WriteLine(item); // 1,2,3,4,5
            }

DO use `Collection<T>` or a subclass of `Collection for properties or return value representing read/write collections.

Collection<T> やそのサブクラスを、プロパティや、戻り値に使う。read/write コレクションの表現として。もし、Collection が条件にあわなければ、IEnumerable<T>, ICollection<T>もしくは、List<T> を使う。更新オッケーなら、一番抽象度が高いのは、Collection だからかな。

DO use ReadOnlyCollection, a subclass of `ReadOnlyCollection<T>, or in rare cases`IEnumerable` for properties or return values representing read-only collections

リードオンリーのコレクションには、ReadOnlyCollection を使う。それが合わないケースは、カスタムコレクションを使う。IEnumerable<T>, ICollection<T>, IList<T> もし、リードオンリーコレクションを実装したかったら、 ICollection<T>.IsReadOnly に戻り値true` をセットする。

CONSIDER using subclasses of generic base collections instead of using the collections directly

直接コレクションを使うかわりに、ベースクラスのコレクションのサブクラスを使う。インターフェイス生実装より、ベースクラスを元にしたほうがそれはいいよね。

public class LogCollection : Collection<Log>  {
    :

CONSIDER returning a subclass of `Collection<T>` or `ReadOnlyCollection<T>` from very commonly used methods and properties

一番よく使われるメソッドに、Collection<T>, ReadOnlyCollection<T> をつかうのは、ベースクラスだから、ヘルパメソッドとかつくれるよね。

CONSIDER using a keyed collection if the items stored in the collection have unique keys(names, IDs, etc). Keyed collections are collections that can be indexed by both on integer and a key and are usually implemented by inheriting from `KeyedCollection<TKey, TItem>`

インデックスがあるのは、KeyCollection というを使う様子。Dictionary じゃないのね。比較してみよう。

        public class Item
        {
            public int Key { get; set; }
            public string Value { get; set; }
        }

        public class MyKeyedCollection : KeyedCollection<int, Item>
        {
            protected override int GetKeyForItem(Item item)
            {
                return item.Key;
            }
        }

        public void KeyedCollectionTest()
        {
            var kcol = new MyKeyedCollection();
            kcol.Add(new Item { Key = 1, Value = "hello" });
            Console.WriteLine(kcol[1]);
        }

むむ。自分でクラス定義しないといけないし、面倒だな。Dictionary と何が違うんだろう。

Dictionary or KeyedCollection

によると

KeyedCollection は、足された順序を保証する。Dictionary は保証しない。ほかには、KeyedCollection のほうが、パフォーマンスコストが少ないらしい。

DO NOT return null values from collection properties or from methods returning collections. Return an empty collection or an empty array instead.

これは、Collection を戻す属性やメソッドは、null ではなく、空のコレクションを返すこと。そらそうだ。

スナップショットか、ライブコレクションか

スナップショットコレクションは、特定の時点でのスナップショットのコレクション。ライブコレクションは、現在の状態を表すもの。

DO NOT return snapshot collections from properties Properties should return live collections

プロパティは、ライブコレクション返す。スナップショットは性質上コピーが必要でコストがかかる。プロパティ (getter) は、とても軽量なオペレーションであるべき。コピーには、O(n) の計算量が必要になる。

DO use either a snapshot collection or a live `IEnumerable<T>` (or its subtype) to represent collections that are volatile(i.e., that can change without explicity modifying collection)

状況が変わりやすいもの。例えば、ディレクトリのファイルなどは、ライブコレクションで表現するのは不可能だ。前に進むだけのenumerator を除いて。だから、基本は、スナップショットコレクションか、IEnumerable つまり　IEnumerator を取得するだけの役割に限定する。

コレクションのネーミング

DO use the "Dictionary" suffix in names of abstractions implementing `IDictionary` or `IDictionary<Tkey, TValue>`

IDinctionary の実装クラスは、Dictionary サフィックスをつける。

DO use the "Collection" suffix in names of types implementing `IEnumerable` (or any of its descendants) and repsenting a list of items.

IEnumerable の実装には、Collection をつける

public class OrderCollection: IEnumerable<Order> {

Do use the appropriate data structure name of custom data structure

カスタムのデータ型には適切な名前を付ける。

public class LinkedList<T> : IEnumerable<t>, ...
public class Stack<T> :ICollection<T> { ...

**AVOID* using any suffixes implying particular implementation, such as "LinkedList" or "Hashtable" in names of collection abstraction

抽象型のネーミングには、実装を連想させる名前は付けない。

CONSIDER prefixing collection names with the name of the item type.

コレクションの名前のプレフィックスに、コレクションのアイテムの型名をつける。

例えば、Address をもったコレクション IEnumerable<Address> は、 AddressCollection. ただし、I は省略する。IDisposable を実装したものは、DisposableCollection。

CONSIDER using the "ReadOnly" prefix in names of read-only collections if a corresponding writable collection might be added or already exists in the framework.

リードオンリーはReadOnly をつける。例えば ReadOnlyStringCollection

おわりに

これでなんとなく C# のコレクションのメソッド定義がわかってっ来た気がしました。いい感じ。

C# のコレクション使用法のガイドライン その２ コレクション