LoginSignup
0
3

More than 3 years have passed since last update.

scikit-learn使い方まとめ(機械学習)

Last updated at Posted at 2020-05-12

クラスタリング分析(k-means法)

・dfにデータフレームを、numにクラスター数を記入
・random_stateで、ランダムシードの整数を指定する

def clustering_analytics(df, num):
    df_temp = df.copy()
    from sklearn.preprocessing import StandardScaler
    from sklearn.cluster import KMeans
    sc = StandardScaler()
    #標準化
    df_std = sc.fit_transform(df_temp)

    kmeans = KMeans(n_clusters=num, random_state=0)
    clusters = kmeans.fit(df_std)
    df_temp["cluster"] = clusters.labels_
    return df_temp

主成分分析(PCA)

・dfにデータフレームを、numに主成分数を記入

def PCA_analytics(df, num):
    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA
    import numpy as np
    sc = StandardScaler()
    df_temp = df.copy()
    #標準化
    df_std = sc.fit_transform(df_temp)
    pca = PCA(n_components = num)
    pca.fit(df_std)
    df_temp__pca = pca.transform(df_std)
    pca_df = pd.DataFrame(df_temp__pca)

    print('components、主成分')
    print(pca.components_)
    print('mean、平均')
    print(pca.mean_)
    print('covariance、共分散行列')
    print(pca.get_covariance())
    W, v = np.linalg.eig(pca.get_covariance())
    print('eigenvector、固有ベクトル')
    print(v)
    print('eigenvalue、固有値')
    print(W)
    return pca_df
0
3
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
3