美しいテストデータを作るために
私の会社では人事向けのシステムを販売しているのですが
テストデータがいかにもテストデータのようで美しくありません
例えば人物名が***「AAA」*とか「テスト太郎」**といった具合です。
普通の名前って意外と難しい
私の場合全国の都道府県でもっともありふれている苗字と明治安田生命が発表した新生児につけたお名前のランキングをランダムで出力するといったところにいったん落ち着きました。
ソースコード
# -*- coding:utf8 -*-
import numpy as np
import random
arr1 = ["佐藤","鈴木","田中","高橋","田中","伊藤","渡辺","山本","中村","小林"," 加藤","吉田","山田","佐々木","山口","松本","根岸","小池","関口","飯山","飯田"]
arr2 = ["健斗","誠","怜","琉生","花","蓮","陽翔","樹","悠真","咲良","悠斗","大翔","凛","陽葵","さくら","美月","結愛","大和","颯真","芽依"]
def cartesian(arrays, out=None):
"""
Generate a cartesian product of input arrays.
Parameters
----------
arrays : list of array-like
1-D arrays to form the cartesian product of.
out : ndarray
Array to place the cartesian product in.
Returns
-------
out : ndarray
2-D array of shape (M, len(arrays)) containing cartesian products
formed of input arrays.
Examples
--------
>>> cartesian(([1, 2, 3], [4, 5], [6, 7]))
array([[1, 4, 6],
[1, 4, 7],
[1, 5, 6],
[1, 5, 7],
[2, 4, 6],
[2, 4, 7],
[2, 5, 6],
[2, 5, 7],
[3, 4, 6],
[3, 4, 7],
[3, 5, 6],
[3, 5, 7]])
"""
arrays = [np.asarray(x) for x in arrays]
dtype = arrays[0].dtype
n = np.prod([x.size for x in arrays])
if out is None:
out = np.zeros([n, len(arrays)], dtype=dtype)
m = n / arrays[0].size
out[:,0] = np.repeat(arrays[0], m)
if arrays[1:]:
cartesian(arrays[1:], out=out[0:m, 1:])
for j in xrange(1, arrays[0].size):
out[j*m:(j+1)*m, 1:] = out[0:m, 1:]
return out
if __name__ == '__main__':
res = cartesian((arr1,arr2))
result = list()
for row in res:
fullname = ' '.join(row)
result.append(fullname)
random.shuffle(result)
for ln in result:
print ln
感想
男性らしい名前や女性らしい名前って敬遠されてるんですね。。
改訂履歴
- 2020/7/22 新規作成