Seabornで相関をヒートマップにする(行・列を並び替えながら) / Heatmap using Seaborn (order rows and columns as you like)

やりたいこと / Purpose

You may want to order entries arbitrarily as you like when drawing a heatmap with seaborn. Here's how to do that.

データ準備 / Data preparation

Prepare a dataset to visualize first. Suppose you went out to three different places and counted how many times you saw some animal species.

import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd

df = pd.DataFrame(data={"cat": [5,2,1],
                        "tiger": [0,3,0],
                        "crow": [4, 3, 3],
                        "penguin": [0,0,2]},
                  index=["city", "forest", "beach"])
cat crab crow penguin tiger
city 5 0 4 0 0
forest 2 1 3 0 3
beach 1 3 3 2 0

相関行列作成&ヒートマップ / Generate a correlation matrix and draw a heatmap

Then calculate a correlation matrix. You just have to call corr() function of pandas.

corr = df.corr()
cat crab crow penguin tiger
cat 1.000000 -0.891042 0.970725 -0.693375 -0.277350
crab -0.891042 1.000000 -0.755929 0.944911 -0.188982
crow 0.970725 -0.755929 1.000000 -0.500000 -0.500000
penguin -0.693375 0.944911 -0.500000 1.000000 -0.500000
tiger -0.277350 -0.188982 -0.500000 -0.500000 1.000000

Here's the correlation matrix, but the entries have been reordered. You can just try draw a heatmap and here's how it would look like:

sns.heatmap(corr, square=True, annot=True)


Looks like a checkerboard. Some problems with ordering, right? Let's order neatly.

並び替え / Order entries

It's easy to order columns with pandas. You just need a list of column labels and you can extract data from a data frame. Let's make put crow and cat, cat and tiger, and crab and penguin next to each other.

order = ["cat", "tiger", "crow", "crab", "penguin"]
crow cat tiger crab penguin
cat 0.970725 1.000000 -0.277350 -0.891042 -0.693375
crab -0.755929 -0.891042 -0.188982 1.000000 0.944911
crow 1.000000 0.970725 -0.500000 -0.755929 -0.500000
penguin -0.500000 -0.693375 -0.500000 0.944911 1.000000
tiger -0.500000 -0.277350 1.000000 -0.188982 -0.500000

Now you get the columns ordered, but rows aren't. Diagonal is messed up and this is not what you wanted. You have order rows as well. Perhaps the easiest way is to use transpose. Since the matrix is symmetric, you don't have to transpose it again in the end.

corr_ordered = corr[order].T[order]
crow cat tiger crab penguin
crow 1.000000 0.970725 -0.500000 -0.755929 -0.500000
cat 0.970725 1.000000 -0.277350 -0.891042 -0.693375
tiger -0.500000 -0.277350 1.000000 -0.188982 -0.500000
crab -0.755929 -0.891042 -0.188982 1.000000 0.944911
penguin -0.500000 -0.693375 -0.500000 0.944911 1.000000

めでたく行も並び変わりました! あらためてヒートマップを描いてみましょう。
Congrats! You have everything right now. Let's draw heatmap again.

sns.heatmap(corr_ordered, square=True, annot=True)


You see it's nicely ordered. Amazing!

カラーパレット変更 / Change color palettes

ついでにカラーパレットを変更しておきましょう。相関ですから+1と-1の両側が重要なので、"sequential color palettes"ではなく" diverging color palettes"を使います。
Let's change the color palette too so that it looks even better. For correlation, you want to use "diverging color palettes" rather than "sequential color palettes" because both extremes (+1 and -1) are important.
cf. https://seaborn.pydata.org/tutorial/color_palettes.html#diverging-color-palettes

cmap = sns.color_palette("coolwarm", 200)
sns.heatmap(corr_ordered, square=True, annot=True, cmap=cmap)


Now you have successfully changed the color palette. This may not be the best choice though... You can find a better one bu yourself!

Clustermapを使う / Use clustermap instead

Now I'll tell you the truth. Seaborn has clustermap function and you don't have to worry about ordering entries.

sns.clustermap(corr, cmap=cmap)


It automatically takes care of distance between entries and orders them! It actually has many more functions and I recommend you to read the document.
cf. https://seaborn.pydata.org/generated/seaborn.clustermap.html

Nonetheless, you may sometimes want to order as you like, and the above is how you can do it.


