3
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 1 year has passed since last update.

学習データをトレーニング/バリデーション/テストデータの3つに分割する方法

Posted at

機械学習の学習データをトレーニング/バリデーション/テストデータの3つに分割する方法のメモ。sklearn.model_selection.train_test_splitでは2つにしか分割しないので、自分で分割するしかないけど、よく忘れるので

10個の要素をトレーニング/バリデーション/テストそれぞれ80% 10% 10%に分割

import numpy as np
import pandas as pd
import random
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
X=np.arange(10)

・スライス操作

X=shuffle(X, random_state=1)
train=np.delete(X,np.s_[::5],0) #5個おきのデータを削除 80%残す
test_valid=X[::5]  #5個おきのデータを取得 20%
test=test_valid[::2] #20%のうち偶数要素を取得 10%
valid=test_valid[1::2] #20%のうち奇数要素を取得 10%

・train_test_splitを使う

train, test_valid = train_test_split(X, test_size=0.2, random_state=1) #80% 20%に分割
test, valid = train_test_split(test_valid, test_size=0.5, random_state=1) # 20%を半分に分割

・1行で書くなら

train, test, valid= np.split(shuffle(X, random_state=1), [int(.8*len(X)), int(.9*len(X))])

[int(.8*len(X)), int(.9*len(X))]は分割位置.
全体の80%, 90%の位置で分割され、結果として8:1:1で分割される

3
2
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
3
2

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?