TL;DR
- データの値を階級値で離散化する
- numpy.digitizeでbinのインデックスを取得してbinそのものへ突っ込む
sample
threshold(というかbins)は昇順または降順にソート済みとする。
import numpy as np
if __name__ == '__main__':
data = np.array([[51, 71, 81, 92], [43, 101, 20, 151]])
threshold = np.array([0, 50, 70, 90, 130, 150, 300])
切り上げ
# 切り上げ
data_right_indice = np.digitize(data, threshold, right=True)
print(data_right_indice)
print(threshold[data_right_indice])
出力
[[2 3 3 4]
[1 4 1 6]]
[[ 70 90 90 130]
[ 50 130 50 300]]
切り下げ
# 切り下げ
decreasing_threshold = np.sort(threshold)[::-1]
data_left_indice = np.digitize(data, decreasing_threshold, right=False)
print(data_left_indice)
print(decreasing_threshold[data_left_indice])
出力
[[5 4 4 3]
[6 3 6 1]]
[[ 50 70 70 90]
[ 0 90 0 150]]
参考文献