Python ツール #1 ― 簡単なソースコードカウンター

Last updated at 2024-07-24Posted at 2024-04-28

Python ツール #1 ― 簡単なソースコードカウンター

「複数のソースコードを読み込んで、それぞれの行数を Excel ファイルに出力する」という処理の Python スクリプトを開発しました。
世の中にもソースコードカウンターは多々ありますが、自由に整形できるようにしておきたい（例えば、Excel ファイルに出力したい）ので、自作してみました。

世の中のソースコードカウンター

かなり便利そうです。

自作するに至った動機

少量の Python スクリプトコードで自分の好みに合わせたソースコード分析ができるような仕組みを作り、その仕組み利用した第一弾としてソースコードカウンターを作り上げたかったです。

UTF-8 のソースコードファイルと SHIFT-JIS のソースコードファイルが混在していても、処理できるようにしたい。（注1）
結果を Excel ファイルに出力したい。
将来的に特殊な処理も実装できるようにしたい。例えば、キーワード単位の grep をしたい、そのとき、コメントや定数の文字は無視したい。なので、ある程度のトークナイズを実装しておきたい。（注2）

注1）過去に、UTF-8 のソースコードファイルと SHIFT-JIS のソースコードファイルが混在しているプロジェクトがありました。本当は、エンコーディングをごちゃ混ぜにして欲しくはないのですが。

注2）過去に、ソースコードファイルを grep して特定のキーワードを抽出して Excel ファイルに出力することで、ソースコードのリファクタリングの参考とする、というものを開発したことがあります。しかし、トークナイズを実装していないので、絞り込みが緩くて抽出し過ぎとなってしまいました。

トークナイズ：文字列を単語（キーワード）に分解すること。

実行イメージ

入力ソースコードファイル
テスト用にさまざまな入力ソース・コード・ファイルを用意しています。

出力 Excel ファイル
ソースコードファイルを読み込んで、ソースコードのライン数とステップ数をカウントし、Excel ファイルに出力します。

ライン数：コメントのみの行や空行を含む全ての行数。
ステップ数：コメントのみの行や空行を含まない行数。

今回の実装レベル

フォルダー内のソースコードファイルを全てシークして、ソースコードのライン数とステップ数をカウントする。
UTF-8 以外のエンコーディングも、なんとかして可能な限り処理できる仕組みにしておく。今回は、utf-8、shift-jis、gb2312 に対応できる仕組みとした。
java・C 言語系（.java、.c、.cpp）、Python（.py）、SQL（.sql）、その他（.txt など）の 4種類のソースコードに対応する。
ソースコードファイルをトークナイズすることで、クォーテーションやダプルクォーテーション内の文字列を文字列定数と認識する。
また、トークナイズによって、前述の “キーワード単位の grep” が実現できるようにする。
ただし、文字列定数がクォーテーションやダブルクォーテーションで正しく囲まれていない場合は、トークナイズが確実にできない。ボロいが、パーザーやリンターを目指しているわけではないので...（言い訳）。

パーザー：構文解析の機能。
リンター：構文チェックの機能。

今後の展開（想定）

ソースコードファイルの対応を追加する。例えば、.sh、.xml、.js、.html 等にも対応する。
前述の “キーワード単位の grep” を実現する。

ソースコードの簡単な説明

ヘッダー。

source_code_counter.py

#!/usr/bin/env python3

・・・

# Import Libraries
import os
・・・

各定数値。必要に応じて修正してください。例えば、ソースコードファイルのパス “IN_SRC_ROOT” は、D: ドライブからの絶対パスでもよいですしし、“source_code_counter.py” からの相対パスでもよいです。

source_code_counter.py

# Input, Output
IN_DIR = '.\\input'
OUT_DIR = '.\\output'
# IN_SRC_ROOT = 'D:\\Developments\\PyCharmProjects\\source_code_counter\\input'  # noqa
IN_SRC_ROOT = '.\\input'
IN_SRC_RELATIVE = '\\src'
IN_EXCEL = IN_DIR + '\\source_code_counter_list_template.xlsx'
OUT_EXCEL = OUT_DIR + '\\source_code_counter_list.xlsx'
OUT_SHEET = 'Source Code Counter List'
ENCODINGS = ['utf-8', 'shift-jis', 'gb2312']
IGNORE_EXTENDS = ['.dat', '.ini']
OUT_DEBUG = OUT_DIR + '\\debug.txt'

Excel 書出し用クラス。

source_code_counter.py

class WriteExcel:

    def __init__(self, in_excel: str, out_excel: str, out_sheet: str) -> None:
        shutil.copy(in_excel, out_excel)
        self._wb = openpyxl.load_workbook(out_excel)
        self._sheet = self._wb[out_sheet]
        self._row_offset = CELL_ROW_OFFSET
        self._row = 0
        self._out_excel = out_excel
        return

    def next_row(self) -> None:
        self._row += 1
        return

    def get_count(self) -> int:
        return self._row + 1

    def write_cell(self, i_col: int, i_value: Union[int, str],
                   i_align: int = None, i_font: Font = None, i_format: str = None) -> None:
        self._sheet.cell(row=self._row_offset + self._row, column=i_col).border = BORDER_ALL
        if i_value is not None:
            self._sheet.cell(row=self._row_offset + self._row, column=i_col).value = i_value
        if i_align is not None:
            self._sheet.cell(row=self._row_offset + self._row, column=i_col).alignment = i_align
        if i_font is not None:
            self._sheet.cell(row=self._row_offset + self._row, column=i_col).font = i_font
        else:
            self._sheet.cell(row=self._row_offset + self._row, column=i_col).font = FONT_MEIRYO
        if i_format is not None:
            self._sheet.cell(row=self._row_offset + self._row, column=i_col).number_format = i_format
        return

    def close(self) -> None:
        self._wb.save(self._out_excel)
        self._wb.close()
        return

4種類のソースコードファイルの解析。ファイルの各行をトークナイズし、ライン数やステップ数を算出する。処理がかなり長いので、処理の一部は “・・・” で省略しています。

source_code_counter.py

# Scan Python File
def scan_python_file(full_path_file: str, fp) -> (int, int, str):

    for enc in ENCODINGS:

        num_lines = 0
        num_steps = 0
        ・・・

        file = open(full_path_file, 'r', encoding=enc)

        while True:

            try:
                str_line = file.readline()
            except Exception:       # noqa
                file.close()
                break

            # End of Data
            if not str_line:
                file.close()
                return num_lines, num_steps, MSG_NORMAL

            num_lines += 1

            ・・・

            pos_current = 0
            pos_end = len(str_comp)

            tokens = []
            ・・・
            is_ope = False

            while pos_current < pos_end:

                ch = str_comp[pos_current]

                ・・・

                pos_current += 1

            # End of One Line
            ・・・

            if fp is not None:
                strs = ''
                spc = ''
                for cnt, val in enumerate(tokens):
                    strs += spc + '[' + val + ']'
                    spc = ' '
                fp.write('%s %5d: %s\n' % ('|' if is_ope else ' ', num_lines, strs))

            if is_ope:
                num_steps += 1

        # End of All lines

    print('file encoding error in %s' % full_path_file, file=sys.stderr)
    return 0, 0, MSG_ERROR


# Scan Java File
def scan_java_file(full_path_file: str, fp) -> (int, int, str):

    for enc in ENCODINGS:

        num_lines = 0
        num_steps = 0
        ・・・

        file = open(full_path_file, 'r', encoding=enc)

        while True:

            try:
                str_line = file.readline()
            except Exception:       # noqa
                file.close()
                break

            # End of Data
            if not str_line:
                file.close()
                return num_lines, num_steps, MSG_NORMAL

            num_lines += 1

            ・・・

            pos_current = 0
            pos_end = len(str_comp)

            tokens = []
            ・・・
            is_ope = False

            while pos_current < pos_end:

                ch = str_comp[pos_current]

                ・・・

                pos_current += 1

            # End of One Line
            ・・・

            if fp is not None:
                strs = ''
                spc = ''
                for cnt, val in enumerate(tokens):
                    strs += spc + '[' + val + ']'
                    spc = ' '
                fp.write('%s %5d: %s\n' % ('|' if is_ope else ' ', num_lines, strs))

            if is_ope:
                num_steps += 1

        # End of All lines

    print('file encoding error in %s' % full_path_file, file=sys.stderr)
    return 0, 0, MSG_ERROR


# Scan SQL File
def scan_sql_file(full_path_file: str, fp) -> (int, int, str):

    for enc in ENCODINGS:

        num_lines = 0
        num_steps = 0
        ・・・

        file = open(full_path_file, 'r', encoding=enc)

        while True:

            try:
                str_line = file.readline()
            except Exception:       # noqa
                file.close()
                break

            # End of Data
            if not str_line:
                file.close()
                return num_lines, num_steps, MSG_NORMAL

            num_lines += 1

            ・・・

            pos_current = 0
            pos_end = len(str_comp)

            tokens = []
            ・・・
            is_ope = False

            while pos_current < pos_end:

                ch = str_comp[pos_current]

                ・・・

                pos_current += 1

            # End of One Line
            ・・・

            if fp is not None:
                strs = ''
                spc = ''
                for cnt, val in enumerate(tokens):
                    strs += spc + '[' + val + ']'
                    spc = ' '
                fp.write('%s %5d: %s\n' % ('|' if is_ope else ' ', num_lines, strs))

            if is_ope:
                num_steps += 1

        # End of All lines

    print('file encoding error in %s' % full_path_file, file=sys.stderr)
    return 0, 0, MSG_ERROR


# Scan Text File
def scan_text_file(full_path_file: str) -> (int, int, str):

    for enc in ENCODINGS:

        num_lines = 0

        file = open(full_path_file, 'r', encoding=enc)

        while True:

            try:
                str_line = file.readline()
            except Exception:       # noqa
                file.close()
                break

            # End of Data
            if not str_line:
                file.close()
                return num_lines, None, MSG_NORMAL

            num_lines += 1

        # End of All lines

    print('file encoding error in %s' % full_path_file, file=sys.stderr)
    return None, None, MSG_ERROR

再帰を使ってフォルダーをシークし、当該フォルダー内の各ソースコードファイルを解析させる処理。

source_code_counter.py

# Seek Directories
def seek_directories(write_excel: WriteExcel, level: int, dir_root: str, dir_relative: str, fp) -> None:

    dirs = []
    files = []

    for path in os.listdir(dir_root):
        if os.path.isfile(os.path.join(dir_root, path)):
            files.append(path)
        else:
            dirs.append(path)

    files.sort(key=str.lower)
    for file in files:
        full_path_file = os.path.join(dir_root, file)
        if fp is not None:
            fp.write('%5d %s\n' % (write_excel.get_count(), full_path_file))
        base, ext = os.path.splitext(file)
        write_excel.write_cell(CELL_COL_NO, write_excel.get_count(), None, None, NUMBER_FORMAT)
        write_excel.write_cell(CELL_COL_PATH, dir_relative, ALIGN_LEFT_NO_WRAP, None, None)
        write_excel.write_cell(CELL_COL_FILE, file, ALIGN_LEFT_NO_WRAP, None, None)
        write_excel.write_cell(CELL_COL_EXT, ext, ALIGN_CENTER, None, None)
        # Ignore Files
        if (base.startswith('.') and ext == '') or ext in IGNORE_EXTENDS:
            lines = None
            steps = None
        elif ext == '.py':
            lines, steps, msg = scan_python_file(full_path_file, fp)
        elif ext in ('.java', '.c', '.cpp'):
            lines, steps, msg = scan_java_file(full_path_file, fp)
        elif ext == '.sql':
            lines, steps, msg = scan_sql_file(full_path_file, fp)
        elif ext == '.txt':
            lines, steps, msg = scan_text_file(full_path_file)
        # Other Files
        else:
            lines, steps, msg = scan_text_file(full_path_file)
        write_excel.write_cell(CELL_COL_LINES, lines, None, None, NUMBER_FORMAT)
        write_excel.write_cell(CELL_COL_STEPS, steps, None, None, NUMBER_FORMAT)
        print('%5d %s %s %s %s %s' %
              (write_excel.get_count(), dir_relative, file, ext,
               lines if lines is not None else '-', steps if steps is not None else '-'))
        write_excel.next_row()

    dirs.sort(key=str.lower)
    for dir_nest in dirs:
        seek_directories(write_excel, level + 1,
                         os.path.join(dir_root, dir_nest), os.path.join(dir_relative, dir_nest), fp)

    return

メイン処理。デバッグログを出力したくない場合は、“fp = None” を設定し、“fp = open( ... )” をコメントアウトしてください。

source_code_counter.py

# Get Current Time
def get_current_time() -> str:

    now = datetime.datetime.now()
    dt = now.strftime("%Y-%m-%d %H:%M:%S")
    return dt


# Main
def main() -> None:

    try:
        options, arguments = getopt.getopt(sys.argv[1:], shortopts="h", longopts=["help"])
    except getopt.error as message:
        print(message)
        print(__doc__)
        sys.exit(1)

    for option, argument in options:
        if option in ("-h", "--help"):
            print(__doc__)
            sys.exit(0)

    print('Source Code Counter - start [%s]' % get_current_time())

    # fp = None
    fp = open(OUT_DEBUG, 'w', encoding='utf-8')
    write_excel = WriteExcel(IN_EXCEL, OUT_EXCEL, OUT_SHEET)

    seek_directories(write_excel, 0, IN_SRC_ROOT + IN_SRC_RELATIVE, IN_SRC_RELATIVE, fp)

    write_excel.close()
    if fp is not None:
        fp.close()

    print('Source Code Counter - end [%s]' % get_current_time())

    sys.exit(0)


# Goto Main
if __name__ == '__main__':
    main()

ソースコードの置き場所

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up