More than 3 years have passed since last update.

pandasのデータフレームでサイズが合ってるのに行列積がとれないときはインデックスとカラムの中身を見よ

Last updated at 2021-10-14Posted at 2018-03-02

先に結論だけ

Pandasのデータフレームで行列積をとるとき（A.dot(B)）は，
A.columnsとB.indexの要素を同じくするべし．

len(A.columns) == len(B.index)

がTrueとなるだけでなく，

list(A.columns) == list(B.index)

もTrueとなることが必要．

A.columns = B.index

あるいは

B.index = A.columns

とすれば，サイズが合っている限り積がとれる（はず）．

バージョン

python: 3.6.4
pandas: 0.20.3

そもそも行列の積とは？

定義：行列の積の定義とその理由 | 高校数学の美しい物語
具体的な計算方法：行列の積の計算方法と例題 - 具体例で学ぶ数学

数学的には，Aの列の数とBの行の数が等しければABは定義される．
しかし，DataFrameで行列積をとるためにはこれだけではダメだった．

データフレームの行列積

ひとりひとりの行動をデータとして集めて，行動と属性を対応させたデータと関連させて，その人の光属性値と闇属性値を推定するとする．　　
　　
とりあえずこんな感じのデータがあるとする．

>>import pandas as pd

>>#ひとりひとりの行動
>>user_action = pd.DataFrame(
    {'user_id' : [1, 2, 3],
     'action_1': [0, 0, 1],
     'action_2': [1, 0, 1], 
     'action_3': [0, 1, 0],
     'action_4': [0, 1, 0],
     'action_5': [1, 0, 1],
     'action_6': [1, 1, 0]},
     columns = ['user_id', 'action_1', 'action_2', 'action_3', 'action_4', 'action_5', 'action_6']
    ).set_index('user_id')

>>#行動と光・闇属性の対応関係
>>action_attribution = pd.DataFrame(
    {'action_id': [1, 2, 3, 4, 5, 6],
     'light': [0, 1, 0, 0, 1, 1],
     'dark': [0, 0, 1, 1, 0, 1]
    }).set_index('action_id')

user_action（太字のところがindex, columns）

	action_1	action_2	action_3	action_4	action_5	action_6
user_id
1	0	1	0	0	1	1
2	0	0	1	1	0	1
3	1	1	0	0	1	0

action_attribution（太字のところがindex, columns）

	dark	light
action_id
1	0	0
2	0	1
3	1	0
4	1	0
5	0	1
6	1	1

一つ目のuser_actionの列（columns）の数と，二つ目のaction_attributionの行（index）の数が同じ．だから，一つ目に二つ目を（右から）掛ければ行列の積をとることができて，

	dark	light
user_id
1	*	*
2	*	*
3	*	*

こんな感じに個人ごとの闇属性値と光属性値がわかるはずだ．

サイズは合ってるのにエラー

>>user_action.dot(action_attribution)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-32-6b8e3241e95e> in <module>()
----> 1 user_action.dot(action_attribution)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in dot(self, other)
    786             if (len(common) > len(self.columns) or
    787                     len(common) > len(other.index)):
--> 788                 raise ValueError('matrices are not aligned')
    789 
    790             left = self.reindex(columns=common, copy=False)

ValueError: matrices are not aligned

かける順番が違う？

>>action_attribution.dot(user_action)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-33-65f902a08f21> in <module>()
----> 1 action_attribution.dot(user_action)

~\Anaconda3\lib\site-packages\pandas\core\frame.py in dot(self, other)
    786             if (len(common) > len(self.columns) or
    787                     len(common) > len(other.index)):
--> 788                 raise ValueError('matrices are not aligned')
    789 
    790             left = self.reindex(columns=common, copy=False)

ValueError: matrices are not aligned

※順番は最初にやったほうのuser_action.dot(action_attribution)で合ってる．

サイズは大丈夫だよな…

>>user_action.shape

(3, 6)


>>action_attribution.shape

(6, 2)

サイズは大丈夫なのに…

エラーメッセージでググってもよくわからなかった．
そこでエラーメッセージの

~\Anaconda3\lib\site-packages\pandas\core\frame.py in dot(self, other)
    786             if (len(common) > len(self.columns) or
    787                     len(common) > len(other.index)):
--> 788                 raise ValueError('matrices are not aligned')
    789 
    790             left = self.reindex(columns=common, copy=False)

に注目してみる．ソースコードを読みにいく．
こんなことが書いてある（抜粋）．

pandas\core\frame.py

    def dot(self, other):
        """
        Matrix multiplication with DataFrame or Series objects

        Parameters
        ----------
        other : DataFrame or Series

        Returns
        -------
        dot_product : DataFrame or Series
        """
        if isinstance(other, (Series, DataFrame)):
            common = self.columns.union(other.index)
            if (len(common) > len(self.columns) or
                    len(common) > len(other.index)):
                raise ValueError('matrices are not aligned')

                #以下略

ここの

if (len(common) > len(self.columns) or
                    len(common) > len(other.index)):

に引っかかったようだ．commonって何ぞや？と上のほうに行くと，

common = self.columns.union(other.index)

とある．ざっくり説明すると，

A.dot(B)

としたとき，selfはA，otherはBにあたると読んでよい．
つまり，commonというのはAの列の名前（の集合）とBの行の名前（の集合）をあわせたもの（和集合）にあたる．ここで大事なのは，commonはふたつの和集合にあたること．

if (len(common) > len(self.columns) or
                    len(common) > len(other.index)):

に引っかからないようにするには，len(common)はlen(self.columns)およびlen(other.index)と等しくなくてはならない．（commonのとりかたから，二つより小さくなることはない．len(self.columns)とlen(other.index)は，数学的に積をとれるなら等しい．）

さっきつくったデータフレームのcommonをつくってみる．

>>common = user_action.columns.union(action_attribution.index)
>>common

Index(['action_1', 'action_2', 'action_3', 'action_4', 'action_5', 'action_6',
                1,          2,          3,          4,          5,          6],
      dtype='object')


>>len(common)

12


>>len(user_action.columns)

6


>>len(action_attribution.index)

6

となる．ここが原因だったようだ．user_action.columnsとaction_attribution.indexで違う「名前」をつけていたせいだった．
行名or列名を更新すれば解決するはず．

>>action_attribution.index = user_action.columns
>>action_attribution

	dark	light
action_id
action_1	0	0
action_2	0	1
action_3	1	0
action_4	1	0
action_5	0	1
action_6	1	1

>>common = user_action.columns.union(action_attribution.index)
>>common

Index(['action_1', 'action_2', 'action_3', 'action_4', 'action_5', 'action_6'], dtype='object')

いけそう！

>>user_action.dot(action_attribution)

	dark	light
user_id
1	1	3
2	3	1
3	0	2

いけた！

まとめ

A.dot(B)

をとるときは，

A.columns = B.index

あるいは

B.index = A.columns

とすれば，サイズが合っている限り積がとれる（はず）．

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up