More than 5 years have passed since last update.

集計結果をExcelへ出力する方法を考える

Python

Last updated at 2017-12-17Posted at 2017-12-16

はじめに

この記事ではPandasを用いて集計した結果をExcelファイルに出力する方法を検討します。
特に体裁面がある程度求められる人のお役に立てればと思います。

実行環境はwindowsでpython3を用いています。

動機

皆様はExcelをどのように使用していますか？
私の周囲で主だった使用シーンは以下のようなものです。

表計算ソフトとして簡易な計算を行う
VBAを用いて簡易システムを作る
共有・プレゼンテーション用に表を装飾する

＃まったく使っていない人　←　裏山です

私は3の目的で使用することが多いです。
データの前処理や分析はpythonやRでおこなっても、
職務上、最終的にはPowerPointかExcelにせざるを得ない人が多いのではないでしょうか？

データ分析は前処理に掛かる工数が殆どだとよく言いますが、
実際のところ、単純な集計の場合には、ドキュメンテーションにかかる工数は無視できません。
このドキュメンテーションをどうすれば改善できるかを考えてみます。

テストデータの作成

何でもよいのですが、企業によくありそうな日別のデータをランダムウォークで用意してみます。

import numpy as np
import pandas as pd
import random

# ランダムウォーク生成
def random_walk(length=100, seed=1234):
    random.seed(a=seed)
    x = [0]
    for j in range(length - 1):
        step_x = random.randint(0, 1)
        if step_x == 1:
            x.append(x[j] + 1 + 0.05*np.random.normal())
        else:
            x.append(x[j] - 1 + 0.05*np.random.normal())
    return x

dtidx = pd.date_range(start='2017-10-1', end='2017-12-31', freq='1d')

length = len(dtidx)
X = pd.DataFrame([random_walk(length, seed) for seed in [1, 10, 100]], 
                 columns=dtidx, index=['A', 'B', 'C']).T

3か月分のある３つの指標の動きとします。

これを月次で平均して出力することを考えます。

result = X.groupby(pd.Grouper(level=0, freq='1m')).mean() #月次で平均
result.index = result.index.to_series().apply(lambda t : t.strftime('%Y-%m')) #indexを文字列に変換

	A	B	C
2017-10	-0.02	2.46	0.53
2017-11	0.03	5.35	-2.27
2017-12	3.81	3.50	-1.90
この表をどう装飾してExcelで出力するかを考えます。
（これだけシンプルだと装飾も何もあったもんではありませんが、例ということで）

案 : Excelのテンプレートを事前に作成しておく

先にデザインを作っておいて、集計表を貼り付けるパターン。
アウトプットの体裁が決まっていて、あとはデータを入れるだけという場合には有効です。
ここではopenpyxlを用いて、狙ったセルにデータを貼り付ける方法を紹介します。

下記のように、色や罫線を引いたExcelファイルを
template.xlsx
として用意しておきます。

あとはopenpyxlを使うだけなのですが、
より手軽に扱えるよう、私はopenpyxlを簡単にラップしたモジュールを分析フォルダに置いています。

excelpaster.py

import pandas as pd
import re
import openpyxl as px
import itertools
from openpyxl.utils.dataframe import dataframe_to_rows

class ExcelPaster:
    def __init__(self, book_name):
        self.wb = px.load_workbook(book_name) #テンプレートの読み出し
        
        # セル番号と列名の対応を作成
        self.r_num = re.compile("([0-9]+)")
        self.r_letters = re.compile("([A-Z]+)")

        letters = [chr(i) for i in range(65, 65+26)]
        col_master = []
        for i in letters:
            col_master.append(i)
        for i, j in itertools.product(letters, letters):
            col_master.append(i + j)
        for i, j, k in itertools.product(letters, letters, letters):
            col_master.append(i + j + k)
        
        self.num2letters = pd.Series(col_master).to_dict()
        self.letters2num = pd.Series(col_master)\
        .rename('letters').reset_index().set_index('letters').to_dict()['index']
        
    def _end_cell_getter(self, df, start_cell, index, header):
        ncols = len(df.columns)
        if index == True:
            ncols  = ncols + df.index.nlevels
        nrows = len(df.index)
        if header == True:
            nrows = nrows + df.columns.nlevels

        start_col = self.r_letters.search(start_cell).group(0)
        end_col = self.num2letters[self.letters2num[start_col] + ncols - 1]
        start_row = int(self.r_num.search(start_cell).group(0))
        end_row = str(start_row + nrows - 1)
        return end_col + end_row
    
    def df_to_cells(self, data, sheet_name, start_cell, index=False, header=False):
        ws = self.wb.get_sheet_by_name(sheet_name)
        df = data.copy()
        if isinstance(df, pd.Series):
            df = pd.DataFrame(df)
        end_cell = self._end_cell_getter(df, start_cell, index, header)
        for cells, r in zip(ws[start_cell:end_cell], 
                            dataframe_to_rows(df, index=index, header=header)):
            for cnt, v in enumerate(r):
                cells[cnt].value = v
    
    def savefile(self, new_book_name):
        self.wb.save(new_book_name)
        
    def sheet_copy(self, original_sheet_name, new_sheet_name):
        ws = self.wb.get_sheet_by_name(original_sheet_name)
        ws = self.wb.copy_worksheet(ws)
        ws.title = new_sheet_name

その上で下記を実行します。

import excelpaster
ep = excelpaster.ExcelPaster('template.xlsx')
ep.df_to_cells(result, 'Sheet1', 'A1', header=True, index=True)
ep.savefile('result.xlsx')

このようにopenpyxlを使えば、体裁が決まっている場合には、Excelを操作する手間が省けます。
pandasでto_clipboard()を使用してもよいのですが、
大量の表を貼り付ける場合に、ミスを発見することが難しくなります。
分析スクリプトが汚くなるのもいやですよね。

おわりに

ここではExcelでテンプレートを用意する方法を提案しました。
pandasのstyleを使用する方法も紹介したかったのですが、
indexを装飾する方法が見当たらず、今回は断念しました。
（分析過程でstyleはとても便利なので、outputできると良いのですが）

マークアップ言語でレポートが通じる世の中になれば、こういった手間もなくなるのですけれども。
（Excelも要はXMLファイルですが）

誰かの助けになれば幸いです。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up