【Cコード解析】複数変数宣言の解消

Posted at 2024-10-23

本日は複数変数宣言を単一変数宣言に分解するスクリプトを考える。

複数変数宣言を見つける

下記の正規表現で複数変数宣言をしている個所をみつける。

(?<=[\n;])[\t ]*(?:(?:auto|register|static|extern)\s+)?(?:(?:const|volatile|restrict|signed|unsigned|short|long)\s+)*\w+[\s*]+\w+(?:\s*=[^,;]+)?\s*(,[\s*]*\w+(?:\s*=[^,;]+)?)+;

長い正規表現だが、大まかに解説する。

(?<=[\n;])[\t ]*：行頭から変数宣言の記述までの空白にマッチ
(?:(?:auto|register|static|extern)\s+)?：記憶クラス指定子にマッチ
(?:(?:const|volatile|restrict|signed|unsigned|short|long)\s+)*：修飾子にマッチ
\w+[\s*]+\w+：データ型と変数名、その間の空白またはポインタ変数宣言時の*にマッチ
(?:\s*=[^,;]+)?\s*：1つ目の変数の初期化式にマッチ
(,[\s*]*\w+(?:\s*=[^,;]+)?)+：複数変数宣言時の,とそれ以降の文字にマッチ
;：複数変数宣言の命令の終端であるセミコロンにマッチ

複数変数宣言を単一変数宣言に分解する

下記の処理で実現する。

import regex

def repldef(match:regex.Match)->str:
	matchtext:str = match.group()
	
	pattern = r"[\t ]*(?:(?:auto|register|static|extern)\s+)?(?:(?:const|volatile|restrict|signed|unsigned|short|long)\s+)*\w+\*?"
	datatype = regex.search(pattern, matchtext, regex.DOTALL).group()

	ret = matchtext.replace(",", f";\n{datatype}")
	return ret

pattern = r"(?<=[\n;])[\t ]*(?:(?:auto|register|static|extern)\s+)?(?:(?:const|volatile|restrict|signed|unsigned|short|long)\s+)*\w+[\s*]+\w+(?:\s*=[^,;]+)?\s*(,[\s*]*\w+(?:\s*=[^,;]+)?)+;"

with open("test.c", "r", encoding="utf-8", errors="ignore") as file_obj:
	codetext = file_obj.read()
	ret = regex.sub(pattern, repldef, codetext, flags=regex.DOTALL)

	print(ret)

先の正規表現を使って複数変数宣言の記述にマッチする。regex.subで単一変数宣言になるよう置換するが単純な置換ではないため、関数を渡して置換後の文字列を動的に作成する。

repldef関数では複数変数宣言の文字列に対して、変数名の前の記憶クラス指定子、修飾子、データ型を抜き出す。それから、,を抜き出した文字を置換後文字とすることで、単一変数宣言の記述に変換する。

まとめ

正直、複数変数宣言はほとんど見たことがないが、単一変数宣言になっていないと他のC言語解析処理の弊害になってしまう。例えば、未使用変数の削除スクリプトを考える場合も単一変数宣言になっていてくれたほうが楽。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up