More than 5 years have passed since last update.

pyparsingで独自スクリプトを読み込んで、構文にあわせた処理をする

Last updated at 2018-08-25Posted at 2018-08-25

pyparsingで独自スクリプトを読み込んで、構文にあわせた処理をする

この記事はMac OS X High SierraにインストールしたPython 2.7とpyparsing 2.2.0を使っています。

概要

例題として、2種類の命令variableとdumpからなる独自言語(DSLとかミニ言語とかスクリプトとか)を考えます。それぞれ次のような構文だとします。

variable name style args

dump id group-id style timestep file args

このvariableとdumpは命令コマンドで、その後に続く引数はそれぞれあるルールをもつ(例えばnameは文字、styleはあらかじめ定義されたキーワードのどれか、idとtimestepは数値、argsは1つ以上の配列)とします。

このルールに従って記述された独自言語コードを読み込んで、構文が正しいかチェックするとともに、もし正しい構文が読み込まれた場合、それぞれの命令コマンドvariableとdumpに対して、別々の処理を行うパーサを作ります。

pyparsingを使います。ifは使わないでやります。複数の命令についてif分岐をしないで処理するための考え方は
、http://www.ptmcg.com/geo/python/confs/pyCon2006_pres2.html に書いてあります。

コードと解説

コードと出力結果と説明を順番にしていきます。

pythonコード

pyparsing_action.py

import pyparsing as pp
import sys
import io
from termcolor import colored, cprint

data = '''\
variable INT equal 100
variable DEC equal 10.0
variable EXP equal 100e5
variable ARY index 1 2 3
dump 1 group atom 1000 output arg1 arg2 arg3
variable ERR equal hoge
'''

class Command(object):
	# common parser elements
	number = pp.Regex(r"[+-]?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?")

	def __init__(self, *args):
		self._action(*args)

	def _action(self, *args):
		cprint ('Do actions in the superclass of {1} with args: {0}'.format(str(args), type(self).__name__), 'green', attrs=['bold'])

	@staticmethod
	def syntax():
		pass

class VariableCommand(Command):
	def __init__(self, quals):
		super(VariableCommand,self).__init__(quals[1], quals[2], quals[3:])

	def _action(self, name, style, args):
		super(VariableCommand,self)._action(name, style, args)
		cprint ('Do actions in {1} with args: {0}'.format(str(args), type(self).__name__), 'green')
		return

	@staticmethod
	def syntax():
		# parsers for command matching
		cmd = pp.Keyword("variable").setResultsName("cmd")
		name = pp.Word(pp.printables).setResultsName("name")
		style = pp.oneOf("equal index").setResultsName("style")
		args = pp.OneOrMore(VariableCommand.number).setResultsName("args")
		return (cmd + name + style + args)


class DumpCommand(Command):
	def __init__(self, quals):
		super(DumpCommand,self).__init__(quals[1], quals[2], quals[3], quals[4], quals[5], quals[6:])

	def _action(self, id, groupid, style, timestep, file, args):
		super(DumpCommand,self)._action(args)
		cprint ('Do actions in {1} with args: {0}'.format(str(args), type(self).__name__), 'green')
		return

	@staticmethod
	def syntax():
		# parsers for command matching
		cmd = pp.Keyword("dump").setResultsName("cmd")
		id = (DumpCommand.number).setResultsName("ID")
		groupid = pp.Word(pp.printables).setResultsName("group-ID")
		style = pp.oneOf("atom atom/vtk image local custom mesh/stl mesh/vtk").setResultsName("style")
		timestep = (DumpCommand.number).setResultsName("timestep")
		file = pp.Word(pp.printables).setResultsName("file")
		args = pp.OneOrMore(pp.Word(pp.printables)).setResultsName("args")
		return (cmd + id + groupid + style + timestep + file + args)

def makeCommandParseAction(cls):
	def cmdParseAction(original_string, location, tokens):
		cls(tokens)
		return tokens
		#$return cls(t)
	return cmdParseAction

def setCommandSyntax():
	variableCommand = VariableCommand.syntax()
	variableCommand.setParseAction(makeCommandParseAction(VariableCommand))
	dumpCommand = DumpCommand.syntax()
	dumpCommand.setParseAction(makeCommandParseAction(DumpCommand))
	return (dumpCommand | variableCommand)

'''
main
'''

syntax = setCommandSyntax()

for i, line in enumerate(data.splitlines(True)):
	line_no_comment = line.partition('#')[0].rstrip()
	print "Parsing line " + str(i) + ": " + repr(line_no_comment)
	try:
		result = syntax.parseString(line_no_comment)
		cprint(result.items(),'cyan')
	except pp.ParseException as e:
		cprint("matching failed", 'red')
		#cprint("no match with error: " + str(e), 'red')

sys.exit()

出力結果

Parsing line 0: 'variable INT equal 100'
Do actions in the superclass of VariableCommand with args: ('INT', 'equal', ['100'])
Do actions in VariableCommand with args: ['100']
[('style', 'equal'), ('cmd', 'variable'), ('name', 'INT'), ('args', (['100'], {}))]
Parsing line 1: 'variable DEC equal 10.0'
Do actions in the superclass of VariableCommand with args: ('DEC', 'equal', ['10.0'])
Do actions in VariableCommand with args: ['10.0']
[('style', 'equal'), ('cmd', 'variable'), ('name', 'DEC'), ('args', (['10.0'], {}))]
Parsing line 2: 'variable EXP equal 100e5'
Do actions in the superclass of VariableCommand with args: ('EXP', 'equal', ['100e5'])
Do actions in VariableCommand with args: ['100e5']
[('style', 'equal'), ('cmd', 'variable'), ('name', 'EXP'), ('args', (['100e5'], {}))]
Parsing line 3: 'variable ARY index 1 2 3'
Do actions in the superclass of VariableCommand with args: ('ARY', 'index', ['1', '2', '3'])
Do actions in VariableCommand with args: ['1', '2', '3']
[('style', 'index'), ('cmd', 'variable'), ('name', 'ARY'), ('args', (['1', '2', '3'], {}))]
Parsing line 4: 'dump 1 group atom 1000 output arg1 arg2 arg3'
Do actions in the superclass of DumpCommand with args: (['arg1', 'arg2', 'arg3'],)
Do actions in DumpCommand with args: ['arg1', 'arg2', 'arg3']
[('style', 'atom'), ('args', (['arg1', 'arg2', 'arg3'], {})), ('group-ID', 'group'), ('cmd', 'dump'), ('N', '1000'), ('file', 'output'), ('ID', '1')]
Parsing line 5: 'variable ERR equal hoge'
matching failed

解説

まず、Commandクラスを親にもつVariableCommandクラスとDumpCommandクラスを作ります。この中に構文の定義を返すsyntax()とコードと、マッチした時に実行させる_action()というメソッドを作っておきます。

関数setCommandSyntax()のなかの

variableCommand = VariableCommand.syntax()

でsyntax()をクラスメソッドとして呼び出して、構文解析器であるpyparsingオブジェクトを作ります。この構文にマッチした時に実行したいことはpyparsingのsetParseActionを使って決めていきます。このソースでは、

variableCommand.setParseAction(makeCommandParseAction(VariableCommand))

としています。つまり、variableCommand=VariableCommand.syntax()の構文にマッチした場合、makeCommandParseAction(VariableCommand))を実行せよ、ということです。setParseActionのmanualを読むと

Parse action fn is a callable method with 0-3 arguments, called as C{fn(s,loc,toks)},
        C{fn(loc,toks)}, C{fn(toks)}, or just C{fn()}, where:
         - s   = the original string being parsed (see note below)
         - loc = the location of the matching substring
         - toks = a list of the matched tokens, packaged as a C{L{ParseResults}} object

と書いてあるように、3つ以下の引数？を処理の中で渡してあげることができます。
makeCommandParseActionにまずVariableCommandが渡されると、ネストされたメソッドであるcmdParseActionが返されます。cmdParseActionは3つの引数をとるメソッドになっているので、setParseActionのs, loc, toksがそれぞれ渡されます。cmdParseActionでは、まずclsに入ったVariableCommandがtokensで初期化されますので、__init__にある_action()が実行されます。

つまり、VariableCommand.syntax()の構文にマッチしたときに実行しておきたいことを_action()のなかに書いておけばよい、ということになります。最後はtokensを返すようにします。

以上のことは、もう一つのコマンドdumpについても同じです。

こんな感じで構文の定義とマッチした時の実行内容をsetCommandSyntax()にまとめて書いていき、それぞれの構文定義をandしたものを戻すようにしておきます。

return (dumpCommand | variableCommand)

で、mainのところ

syntax = setCommandSyntax()

として、構文定義をsyntaxという変数に放り込んでおきます。

そのあと、ソースコードを一行ずつ読み込んだ変数line_no_commentに対してpyparsingのparseStringメソッドを実行すると、構文が解析させるとともに、もしマッチした時は先ほどのsetParseActionが実行されることになります。

result = syntax.parseString(line_no_comment)

もし、例えばcomputeという新しい命令も解析したくなったら、

Commandの子クラスとしてcomputeCommandを作る。
computeCommand.syntax()に構文定義をする。
computeCommand._action()に実行したいことを書いとく。
setCommandSyntax()に付け足す。

をすればよくて、main側は何も変更する必要がありません。

おまけ　

pyparsingでは、Tokenを組み合わせて構文を作っていく。基本的なTokenは、決まり文句を表すLiteralと、あるルール(アルファベットalphasとか数字numsとかエスケープを除くASCII文字printablesとか便利なやつがいくつか用意されている)に従うWord、正規表現にマッチする字句Regexの3種類。他は、

Token	使い方
Literal	決まり文句
Empty	何にでもマッチする
Keyword	完全に一致する決まり文句
CharsNotIn	ここで指定した文句を取り除いた字句群
CloseMatch	n個のミスマッチを許容するLiteral
NoMatch	絶対マッチしない
White	pyparsingでは普通空白を無視するが、これを使うと、ある特定の位置に現れる空白にマッチ
Regex	正規表現
Word	あるルールに従う文句

これらの組み合わせていくために

MatchFirst 和なんだけど順序つき
OneOf Literalの和
OneOrMore 1回以上の繰り返し
などなど

が用意されている。

pyparsing_example1.py

cmd = pp.Literal("def")
some_word = pp.Word(pp.printables)
number = pp.Regex(r"[+-]?\d+(?:\.\d+)?(?:[eE][+-]?\d+)?").setParseAction(lambda t: float(t[0]))

参考

http://www.ptmcg.com/geo/python/confs/pyCon2006_pres2.html
http://infohost.nmt.edu/~shipman/soft/pyparsing/web/index.html

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up