Python
IPython
pandas
Jupyter

pandas DataFrameの出力行を絞るheadとtailを同時に使う

pandas使ってデータの中身を見るときdfと打つとダラダラっと長い出力が出てきて見た目がよろしくありません。

df = pd.DataFrame(np.random.randn(100,4)); df
0 1 2 3
0 0.374859 0.327898 2.215511 1.165490
1 -0.939833 -0.531873 -1.717368 -0.584834
2 0.759525 1.992222 -0.352082 -1.500736
3 -0.279484 -0.278289 0.625053 0.362855
4 1.151177 -1.398746 0.391291 0.673220
5 -0.392235 -0.973586 0.243700 -2.899188
6 0.837239 0.670279 -0.692629 -1.126292
7 -0.921781 -2.438753 -0.519993 0.482150
8 2.459798 1.219577 0.770672 -1.390487
9 -1.093845 0.343168 -0.229751 1.172888
10 -0.437252 -0.824611 -0.346145 0.785992
11 1.193672 -0.193474 0.676684 -1.468454
12 1.039551 0.234592 -0.192957 -1.210177
13 1.081615 -0.988146 -0.021931 1.137428
14 0.470213 1.239319 -0.346861 -0.288200
15 -0.339914 -1.580660 -0.432387 -0.202277
16 1.141389 0.236465 -1.477666 -0.264886
17 -0.686339 0.971620 -0.733747 -0.110410
18 0.266442 -0.168084 -2.021432 -1.337447
19 0.698942 1.409780 -0.506928 0.999617
20 0.432697 -0.629534 -0.605271 -2.336144
21 1.377673 0.761185 1.023692 -1.472238
22 0.152084 -0.725003 1.553365 -0.544019
23 0.156944 0.505415 -1.222674 0.423808
24 0.479288 0.201019 -0.091332 0.254680
25 -1.184456 -0.095066 -0.885104 0.421549
26 1.040014 -1.381022 1.869261 1.437337
27 0.478984 -0.944046 0.352453 2.569114
28 -0.439603 -1.298592 0.913691 -0.622890
29 -0.545850 -0.872281 0.213367 -0.681539
... ... ... ... ...
70 -1.469679 -0.456337 -0.329848 0.286484
71 -1.219977 -2.282746 0.506492 -0.200502
72 0.365171 0.926229 0.935084 -1.001133
73 -0.842211 -0.040298 -0.728098 2.851352
74 -0.897560 0.861064 1.990610 0.267552
75 -0.703071 0.784476 -1.002520 -0.450417
76 0.033787 0.073530 0.214343 1.787105
77 -1.258251 -0.030197 1.320570 0.393222
78 -1.766407 -0.996086 -0.192385 -0.513102
79 -0.629811 -0.487538 0.923048 -1.497247
80 0.083737 0.317975 0.325503 0.372319
81 -0.391032 -1.192947 0.312277 -0.249235
82 -1.525711 -0.994144 -1.411683 -0.297697
83 -0.794180 0.776143 0.057774 1.659901
84 -0.270637 -1.165053 0.508089 -0.445596
85 -1.961543 1.973141 0.533462 -1.327931
86 -0.100805 -0.162729 1.448156 -0.224008
87 -0.514309 0.323078 -0.233127 1.384196
88 -2.516856 0.374363 1.129207 1.069754
89 0.577997 -0.767833 0.923292 -0.311372
90 0.758016 -0.920520 0.109853 0.021920
91 0.406649 -0.239311 1.024492 1.009525
92 -1.666999 1.912280 -1.959626 -1.008634
93 0.222210 -1.378929 -0.609868 0.749869
94 -1.622319 0.035508 -1.547729 -0.480135
95 0.895873 -0.045676 0.180615 -1.252418
96 -0.630758 -0.285296 0.160133 1.106705
97 -1.909217 0.841634 -0.011388 0.348177
98 -1.271435 1.725388 1.075685 -0.164461
99 0.379877 -2.547350 0.899402 -1.615333

100 rows × 4 columns

pandasのオプションを変更

出力列の最大値はset_optionメソッドで変更できます。

参考: pandas 0.21.1 documentation

pd.set_option("display.max_rows", 10)
df
0 1 2 3
0 0.044887 1.109229 -1.404712 0.014551
1 -2.032691 -0.435130 -0.428953 -0.537191
2 -0.777178 -0.435460 0.848413 -1.635667
3 -0.213586 -1.509976 -0.635302 -0.138209
4 0.639734 0.446097 -1.312515 0.783796
... ... ... ... ...
95 -0.288322 0.132045 -0.405144 -0.542612
96 -0.631888 -0.487683 0.684156 -0.114015
97 -0.405025 1.719596 0.451788 0.930674
98 0.146326 1.032291 -0.474848 0.847260
99 0.473070 0.408938 -0.504452 -0.214760

100 rows × 4 columns

戻すときはsetをresetに変えればOKです。(第2引数にsetした値の10も指定しなきゃいけないんだ・・・)

pd.reset_option("display.max_rows", 10)
df
0 1 2 3
0 0.044887 1.109229 -1.404712 0.014551
1 -2.032691 -0.435130 -0.428953 -0.537191
2 -0.777178 -0.435460 0.848413 -1.635667
3 -0.213586 -1.509976 -0.635302 -0.138209
4 0.639734 0.446097 -1.312515 0.783796
5 -0.156756 0.521311 0.060626 0.206347
6 0.591887 1.441567 0.587750 -0.240194
7 -0.098514 1.053005 0.072088 -0.891726
8 1.484554 -0.360987 -1.724210 -1.516901
9 -0.918722 0.344975 -0.439208 -1.284894
10 -0.223029 -0.107058 1.234283 -1.055316
11 -0.806544 0.744367 0.594333 -0.993136
12 -0.680134 1.570801 1.204924 -0.859910
13 -0.639150 -0.004267 -0.691408 -0.214076
14 -0.219878 -0.514751 -1.332166 0.570380
15 1.990532 -1.174292 -0.118421 -0.113356
16 0.653598 0.100153 1.636529 0.052311
17 -1.045861 -0.064809 -0.433254 0.964098
18 0.231979 0.067611 -1.253458 -1.037114
19 1.344306 -0.936234 0.594781 0.105511
20 -0.413086 -0.486708 -0.911816 1.050004
21 0.973888 -1.365909 -3.741730 -1.507470
22 -0.821778 0.355387 -1.467101 0.362862
23 0.005682 -0.254259 1.408601 0.772690
24 0.761357 -0.431552 1.230341 1.244104
25 0.433293 -0.185350 0.937934 -0.913643
26 -0.202213 0.528685 -0.745797 -2.023442
27 1.285016 0.756849 0.636789 0.035517
28 -2.249611 1.001626 -0.071847 -0.490456
29 -0.827190 -0.157449 -1.775256 0.680590
... ... ... ... ...
70 -0.283343 1.396622 -1.375399 -0.297667
71 -1.792001 1.488617 -0.047619 0.584341
72 -1.472638 -3.259683 -0.706456 0.508512
73 0.743772 -0.313420 1.423694 -1.095836
74 -0.923600 -0.320489 -0.920354 -0.676194
75 -0.382733 0.748782 0.510318 0.190481
76 -0.834656 -0.927456 1.718550 0.244518
77 -0.537161 0.315323 0.243676 -1.853278
78 -0.549080 0.659434 -0.627163 1.142092
79 1.497259 -0.183383 -0.931365 -0.712263
80 0.809390 0.696450 -0.949674 0.333511
81 0.107922 0.323430 1.218619 -0.486692
82 -0.969837 0.585856 1.138128 0.399262
83 -0.423241 0.855566 -1.322747 -0.313059
84 -0.708709 -1.031457 -0.361363 1.389282
85 -1.155997 -0.054445 -1.037225 -2.020944
86 -0.509943 1.279200 -1.473619 1.070197
87 0.593176 0.660035 0.809127 -1.455174
88 1.867072 -0.697688 -1.144857 -1.740410
89 0.500170 -0.266405 0.226681 -0.800579
90 -1.746962 1.414762 0.789651 -1.362200
91 -0.289363 1.300986 0.210491 1.529958
92 -0.852068 -0.048329 -0.269035 0.250980
93 -0.388686 -1.312654 -1.036473 -1.297159
94 -1.035741 0.097650 0.454851 2.067922
95 -0.288322 0.132045 -0.405144 -0.542612
96 -0.631888 -0.487683 0.684156 -0.114015
97 -0.405025 1.719596 0.451788 0.930674
98 0.146326 1.032291 -0.474848 0.847260
99 0.473070 0.408938 -0.504452 -0.214760

100 rows × 4 columns

head/tailメソッド

df.head()とかdf.tail()すれば表示を少なくできます。

df.head()
0 1 2 3
0 0.044887 1.109229 -1.404712 0.014551
1 -2.032691 -0.435130 -0.428953 -0.537191
2 -0.777178 -0.435460 0.848413 -1.635667
3 -0.213586 -1.509976 -0.635302 -0.138209
4 0.639734 0.446097 -1.312515 0.783796
df.tail()
0 1 2 3
95 -0.288322 0.132045 -0.405144 -0.542612
96 -0.631888 -0.487683 0.684156 -0.114015
97 -0.405025 1.719596 0.451788 0.930674
98 0.146326 1.032291 -0.474848 0.847260
99 0.473070 0.408938 -0.504452 -0.214760

head/tailをつなげる

しかしながら、これだけでは不十分で、データの頭とおしりを同時に確認したい局面がよくあるんですよね。headとtail両方同時に出力できたら便利なのに・・・。

やっている人がいました

Stack Overflow - python pandas select both head and tail

df.head().append(df.tail())

とすればheadとtailをつなげてくれます。

df.head().append(df.tail())
0 1 2 3
0 0.044887 1.109229 -1.404712 0.014551
1 -2.032691 -0.435130 -0.428953 -0.537191
2 -0.777178 -0.435460 0.848413 -1.635667
3 -0.213586 -1.509976 -0.635302 -0.138209
4 0.639734 0.446097 -1.312515 0.783796
95 -0.288322 0.132045 -0.405144 -0.542612
96 -0.631888 -0.487683 0.684156 -0.114015
97 -0.405025 1.719596 0.451788 0.930674
98 0.146326 1.032291 -0.474848 0.847260
99 0.473070 0.408938 -0.504452 -0.214760

head/tailをつなげたやつのタイプを短くする

lambda式を使ってメソッドとして登録しちゃいます。
メソッドの名前はlessとしましたが、好きな名前をつけて、~/.ipython/profile_default/startup以下にスクリプトを保存していつでも呼び出せるようにしちゃいましょう。

~/.ipython/profile_default/startup/less.ipy
pd.DataFrame.less = lambda df, n=10: self.head(n//2).append(self.tail(n//2))
df.less()
0 1 2 3
0 0.044887 1.109229 -1.404712 0.014551
1 -2.032691 -0.435130 -0.428953 -0.537191
2 -0.777178 -0.435460 0.848413 -1.635667
3 -0.213586 -1.509976 -0.635302 -0.138209
4 0.639734 0.446097 -1.312515 0.783796
95 -0.288322 0.132045 -0.405144 -0.542612
96 -0.631888 -0.487683 0.684156 -0.114015
97 -0.405025 1.719596 0.451788 0.930674
98 0.146326 1.032291 -0.474848 0.847260
99 0.473070 0.408938 -0.504452 -0.214760

引数指定で出力行を変えられます。

df.less(2)
0 1 2 3
0 0.044887 1.109229 -1.404712 0.014551
99 0.473070 0.408938 -0.504452 -0.214760

jupyter notebookで人に見せる記事を書くときはググれる情報であるset_optionで出力を常に変えたり、ix, iloc, locメソッドで出力を絞ると良いですが、下書き段階やipythonで一時的に出力行を絞るときはtailheadの合わせ技の方が有効かもしれません。