LoginSignup
0
0

More than 5 years have passed since last update.

pandasのDataFrame形式の2つの確率分布のcos類似度を計算するスクリプト

Posted at
utils.py
def calc_cos_mat(mat_df1, mat_df2):
    """
     Args:
        pd.DataFrame: mat_df1, mat_df2
        The modals of mat_df1 column and mat_df2 column must be same!
    Returns:
        pd.DataFrame: cosign_simularity_matrix_df

    """
    import pandas as pd
    from sklearn.preprocessing import normalize

    assert type(mat_df1) == pd.core.frame.DataFrame
    assert type(mat_df1) == pd.core.frame.DataFrame
    assert mat_df1.shape[1] == mat_df2.shape[1]

    normalized_mat_df1 = pd.DataFrame(normalize(mat_df1), index=mat_df1.index)
    normalized_mat_df2 = pd.DataFrame(normalize(mat_df2), index=mat_df2.index)

    return normalized_mat_df1.dot(normalized_mat_df2.T)


def get_sorted_mats(mat_df):
    """ change the order column for each row

     Args:
        pd.DataFrame: mat_df
    Returns:
        pd.DataFrame: sorted_column_mat, probability_mat

    """

    jan_mat = pd.DataFrame()
    prob_mat = pd.DataFrame()

    for i, idx in enumerate(mat_df.index):
        jan =  pd.DataFrame(mat_df.sort_values(idx, axis=1, ascending=False).columns).T
        jan.index = [idx]
        prob =  pd.DataFrame(mat_df.sort_values(idx, axis=1, ascending=False).loc[idx].values).T
        prob.index = [idx]
        jan_mat = pd.concat([jan_mat, jan])
        prob_mat = pd.concat([prob_mat, prob])

    return jan_mat, prob_mat
0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0