Recently I got machine learning task about classification problem at school. Today I'm gonna review some methods which we can use for classification problem and show how I implemented code in python.
Basically I used sklearn library to implement all methods below.
This time i got 2000 sets of x1 ~ x16 and y. These should be classified in 3 groups. And I chose parameter using grid search.

Support Vector Machine

This is supervised learning model for both regression and classification analysis. Suppose data points belong to either one of the two groups. Goal is to predict which groups new data point belong to. This time, if we view these plots in p-dimensional vector, how we can separate with (p-1)-dimensional hyperplane. There are many hyperplanes but best one is the one represents largest margin between two classes.

#def computeSVM(C, kernel, gamma, train_x, train_y, test_x):
clf = SVC(C = C, kernel = kernel, gamma = gamma)
model = clf.fit(train_x, train_y)
predict = model.predict(test_x)
return predict

accuracies for each fold are below.

0.885
0.865
0.885
0.87
0.875
0.89
0.855
0.86
0.885
0.865

Neural Network

Given data, NN learn to identify features in unknown dataset without any priori knowledge.

fullsizeoutput_1e86.jpeg

If you want to identify "dog" from picture, input might be like "ears detector", "legs detector", "nose detector" or something like that.
Hidden layer's job is convert input raw data into something output layer can use.

def computeNN(x_train, y_train, x_test):
clf = MLPClassifier(solver = 'lbfgs', alpha = 1e-4,  hidden_layer_sizes = (5, 5), random_state = 1)
learn = clf.fit(x_train, y_train)
predict = learn.predict(x_test)
return predict

This accuracy was incredibly bad.(I don't know why...)

0.31
0.35
0.305
0.365
0.35
0.3
0.33
0.35
0.335
0.36

K-nearest neighbours

This is non parametric method used for classification and regression.Object is classified by majority vote of its neighbours, with the object being assigned to the class most common among its k nearest neighbours.

#def method3_kNN():
X, y = trainDataImport()
kf = KFold(n_splits=10)
ypredicts = []
for train_index, test_index in kf.split(X):

    X_train, X_test = X[train_index], X[test_index]
    y_train, y_test = y[train_index], y[test_index]

    y_pred = computekNN(X_train, y_train, X_test)
    print(evaluation(y_test, y_pred))

0.825
0.84
0.79
0.815
0.84
0.81
0.775
0.815
0.76
0.83

2nd best method so far!

Ill update if i do/learn something new.

https://en.wikipedia.org/wiki/K-nearest_neighbors_algorithm

Sign up for free and join this conversation.
Sign Up
If you already have a Qiita account log in.