More than 3 years have passed since last update.

scikit-learn使い方まとめ(機械学習)

Last updated at 2020-05-18Posted at 2020-05-12

クラスタリング分析(k-means法)

・dfにデータフレームを、numにクラスター数を記入
・random_stateで、ランダムシードの整数を指定する

def clustering_analytics(df, num):
    df_temp = df.copy()
    from sklearn.preprocessing import StandardScaler
    from sklearn.cluster import KMeans
    sc = StandardScaler()
    #標準化
    df_std = sc.fit_transform(df_temp)
    
    kmeans = KMeans(n_clusters=num, random_state=0)
    clusters = kmeans.fit(df_std)
    df_temp["cluster"] = clusters.labels_
    return df_temp

主成分分析(PCA）

・dfにデータフレームを、numに主成分数を記入

def PCA_analytics(df, num):
    from sklearn.preprocessing import StandardScaler
    from sklearn.decomposition import PCA
    import numpy as np
    sc = StandardScaler()
    df_temp = df.copy()
    #標準化
    df_std = sc.fit_transform(df_temp)
    pca = PCA(n_components = num)
    pca.fit(df_std)
    df_temp__pca = pca.transform(df_std)
    pca_df = pd.DataFrame(df_temp__pca)
    
    print('components、主成分')
    print(pca.components_)
    print('mean、平均')
    print(pca.mean_)
    print('covariance、共分散行列')
    print(pca.get_covariance())
    W, v = np.linalg.eig(pca.get_covariance())
    print('eigenvector、固有ベクトル')
    print(v)
    print('eigenvalue、固有値')
    print(W)
    return pca_df

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up