More than 5 years have passed since last update.

hypothesisでサンプルCSVデータを作成する方法

Last updated at 2020-03-17Posted at 2020-03-16

hypothesis は、テストケースより有効的にユニットテスト書けるライブラリです。

テストに使う値を幅広く試したりすることができるらしい。ただし、今回は、hypothesisのデータ生成機能を使って、サンプルデータを簡易作成できるかをたましてみたいと思います。

課題

ファイル形式（CSV）を決まった範囲や制限内のデータを生成したい。

やってみます！

まず、データの定義したいですね。hypothesisを作成してほしいデータは、strategyというものをつかって定義します。

今回は、pythonの標準ライブラリ csv を使うので、DictWriterのためのDict作成しちゃいば、簡易書き出せるので、dictを作成の場合、fixed_dictionaries というstrategyがあるので、これは、このようにできるようです。

そのなかで、作りたいdictのキーいれて、さらに値を作成するstrategyを決めて、定義ができます！

from hypothesis import strategies as st

DictRowDataModel = st.fixed_dictionaries({
    'k_id': st.none(),
    'w_id': st.none(),
    '項目１': st.integers(min_value=1, max_value=7),
    '項目2': st.integers(min_value=1, max_value=5),
    '項目３': st.integers(min_value=1, max_value=16)
})

次にわかりずらかったのが、これをつかってどうやって、データ生成すること。
通常ユニットテスト内で使われる作りのようなので、この使用例がないようです。

テストケース使う例：

from hypothesis import given
import hypothesis.strategies as st

@given(st.integers(), st.integers())
def test_ints_are_commutative(x, y):
    assert x + y == y + x

でも探してみると、strategyにexample()のメッソドが用意されていて、それが使えるようです：


import csv
from hypothesis import strategies as st

d = {
    'k_id': st.none(),
    'w_id': st.none(),
    '項目１': st.integers(min_value=1, max_value=7),
    '項目2': st.integers(min_value=1, max_value=5),
    '項目３': st.integers(min_value=1, max_value=16)
}

DictRowDataModel = st.fixed_dictionaries(d)

samples = 3
with open('sample.csv', 'w', encoding='utf8') as out:
    writer = csv.DictWriter(out, fieldnames=tuple(d.keys()))
    for i in range(samples):
        sample = DictRowDataModel.example()
        writer.writerow(sample)

自分が範囲生成のコード書けずに済みました。嬉しい。

結論

strategy の.example() を使えば、楽にCSVデータ作成ができました〜　

こちらのWarningが出ますが、テストの速度などの注意点の用なので、とりあえず、作成されます。今のところ無視しています：

NonInteractiveExampleWarning: The `.example()` method is good for exploring strategies, but should only be used interactively.  We recommend using `@given` for tests - it performs better, saves and replays failures to avoid flakiness, and reports minimal examples. (strategy: fixed_dictionaries(...),

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up