こちらと同じことを、R で行いました。しかし、スコアはかなり違います。
scikit-learn の使い方 (その4)
[1]
library('randomForest')
[2]
train_path <- '../input/house-prices-advanced-regression-techniques/train.csv'
test_path <- '../input/house-prices-advanced-regression-techniques/test.csv'
submission_path <- '../input/house-prices-advanced-regression-techniques/sample_submission.csv'
train <- read.csv(train_path, stringsAsFactors = F )
test <- read.csv(test_path, stringsAsFactors = F )
submission <- read.csv(submission_path, stringsAsFactors = F )
#
train <- train[,c(5,18,20,55,81)]
test <- test[,c(5,18,20,55)]
#
names(train)
head(train)
head(submission)
[3]
set.seed(754)
rf_model <- randomForest(SalePrice ~ ., data = train)
print(rf_model)
[4]
prediction <- predict(rf_model, test)
[5]
submission.SalePrice = prediction
#
write.csv(submission, file = 'house_price_r_sep0601.csv', row.names = F)
head(prediction)
head(submission)
nrow(submission)
length(submission)
dim(submission)
関連ページ
R: Kaggle Titanic