More than 1 year has passed since last update.

Jupyterでしかできない機能（Pandasのデータのデバッグでの苦労を解消）

Posted at 2023-04-17

通常のPythonでPandasを扱う時の苦労する部分
PythonのPandasで次のようにコードを書くことが多いと思います。

import pandas as pd
df = pd.DataFrame([[4, 6], [1, 9], [3, 4], [5, 5], [9,6]],
               columns=["Mike", "Jim"],
               index=["Mon", "Tue", "Wed", "Thurs", "Fri"])
print(df)

実行結果

       Mike  Jim
Mon       4    6
Tue       1    9
Wed       3    4
Thurs     5    5
Fri       9    6

この事例だとデータが少ないので、問題ないのですが、次の場合はどうでしょうか？

arr = np.random.rand(100, 2)
df = pd.DataFrame(arr, columns=["Mike", "Jim"])
print(df)

実行結果データの途中が「... ... ...」になる、ちょっと不親切である

        Mike       Jim
0   0.518550  0.800410
1   0.049528  0.173049
2   0.501506  0.037998
3   0.114218  0.450367
4   0.650279  0.786903
..       ...       ...
95  0.568600  0.482608
96  0.565669  0.066380
97  0.798853  0.639985
98  0.513716  0.916416
99  0.911314  0.686543

[100 rows x 2 columns]

では、Jupyterを導入するとどうなる？(残念ながらセルの出力はPythonと同じ）

Mike	Jim
0	0.755812	0.277540
1	0.043467	0.343764
2	0.582709	0.669783
3	0.166682	0.054200
4	0.365496	0.971671
...	...	...
95	0.798789	0.356647
96	0.471416	0.009733
97	0.138570	0.619475
98	0.382083	0.554818
99	0.157959	0.462000
100 rows × 2 columns

しかし、Jupyter専用の出力のデータビューア機能があって、これを使うことが可能

試しにDFを表示させると全量が出る、また検索と昇順、降順並び替えも可能

さらに次のようなことも可能

def highlight_max(x, color):
    return np.where(x == np.nanmax(x.to_numpy()), f"color: {color};", None)
def highlight_min(x, color):
    return np.where(x == np.nanmin(x.to_numpy()), f"color: {color};", None)
li = [['English','Japanese','Chinese','Koreans','Taiwanese'],[10,20,30,40,50]]
df1 = pd.DataFrame(li,columns=['a','b','c','d','e']).T
df1.style.apply(highlight_min, color='red')

実行結果（ハイライト機能）

こうして、Jupyterを有効活用して「Python Lifeを楽しんでください」

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up