More than 5 years have passed since last update.

書籍「コンパイラ作りながら学ぶ」を読みながら JavaScript でコンパイラを実装してみた

Last updated at 2020-05-04Posted at 2020-05-03

はじめに

この記事では、中田育男著「コンパイラ作りながら学ぶ」(オーム社) に出てくるプログラミング言語 PL/0' を JavaScript で実装します。

コンパイラ作りながら学ぶ

この書籍ではシンプルなプログラミング言語 PL/0' コンパイラをC言語で実装したソースコードが巻末に掲載されており、それをもとにコンパイラの理論が説明されています。書籍では完成系のコードが一度に示されていますが、この記事では最小限のサブセットを設定して、徐々に拡張していきながら処理系を完成させます。

対象読者

この記事は以下のような人をターゲットに書かれています。

「コンパイラ作りながら学ぶ」の本文を一通り読んだ上で、巻末のコードを理解しようとしたものの途中で挫折した人

「コンパイラ作りながら学ぶ」はもともと1995年に「コンパイラ」というタイトルで出版されていた本の改訂版です。私は改訂前の本を高校生の頃読んで、途中で意味が分からなくなり挫折した経験があります。上のターゲットは昔の自分自身です。

PL/0' 言語

書籍では PL/0' は PL/0 に引数付きの関数を追加したものであると説明されています。PL/0 は Niklaus Wirth の「アルゴリズム＋データ構造＝プログラム」に掲載されている言語とのことです。こちらの書籍、私は読めていないのですが、以下の Wikipedia の記事に解説を見つけました。PL/0 も PL/0' も名前はPL/1のサブセットに見えますが、Pascal のサブセットとして定義されているようです。

PL/0

PL/0' の文法は構文図式で以下のように定義されています。構文図の画像の生成には RR - Railroad Diagram Generator を利用しています。

program (プログラム)

プログラムの最後にはピリオドがつきます。

block (ブロック)

constDecl (定数宣言)

varDevl (変数宣言)

funcDecl (関数宣言)

statement (文)

代入が = ではなく := なのが JavaScript とは違っています。write は式の結果を表示する文、writeln は改行を出力する文です。

condition (条件)

odd は後続の式が奇数であれば真、偶数であれば偽になる演算子です。両辺の値が等しいことを判定する演算子が = 等しくないことを判定する演算子が <> です。

expression (式)

単項演算子がつく位置が JavaScript と異なっています。JavaScript では単項演算子の優先度が高いので -2 * -3 のような式が書けますが、PL/0' では -2 * (-3) のように括弧をつける必要があります。

term (項)

factor (因子)

書籍でも述べられていますが、ident を読んだ時点ではそれが変数か関数呼び出しか決定できないため文法が LL(1) 文法になっていません。しかし、構文解析時に名前表を調べて ident の種類を取得できるため、トークンを一つだけ先読みして構文解析できます。

PL/0' 言語は関数の中に関数を定義できます。関数の中で定義された関数は、その外側の関数の引数を参照できます。一見難しそうに見えますが PL/0' は JavaScript のように関数を値として扱うことはないため、クロージャを作る必要は無く比較的簡単にコンパイルできます。

仮想マシン

書籍で解説されている PL/0' のコンパイラは、X86やARMやRISC-Vのような実際のCPUの機械語を生成するのではなく、仮想マシンの命令を生成します。コンパイラを作り始める前にまずはこの仮想マシンの構造を一通り眺めます。

こちらの仮想マシン、書籍中では特に名前が付けられていないので、この記事でも単に仮想マシンと呼びます。仮想マシンはスタックマシンになっていて、スタックに値をpushしたりpopしたりしながら計算が進んでいきます。

記憶領域

仮想マシンには次の3種類の記憶領域があります。

code: プログラムの命令が格納されています。インデックス0からインデックス199まで、全部で200個の命令を格納できます。
stack: 計算中の値を保存するメモリです。アドレス0からアドレス1999まで、全部で2000個の値を格納できます。
display: 各ブロックの開始アドレスをを格納しているメモリです。レベル0からレベル4まで、全部で5個の値を格納できます。

また次の2種類のレジスタがあります。

top: スタック上で次に値を入れるアドレスを格納しているレジスタです。
pc (プログラムカウンタ): コード上で次に実行する命令のインデックスを格納しているレジスタです。

今回の実装ではコードやスタックの最大サイズの条件は省略します。

命令

仮想マシンには次の9種類の命令が定義されています。

lit (リテラル)

lit (値)

オペランドの値をスタックにpushします。

sto (ストア)

sto(レベル, 相対アドレス)

スタックから値をpopしてオペランドで指定されたアドレスに格納します。アドレスはレベルと相対アドレスで計算します。

lod (ロード)

lod(レベル, 相対アドレス)

オペランドで指定されたアドレスの値をスタックにpushします。アドレスはレベルと相対アドレスで計算します。

opr (演算)

opr(演算の種類)

オペランドで指定された演算を実行します。演算の種類によって動作が異なります。演算には以下のものがあります。

neg: スタックから値を1つpopしてその値の符号を反転させたものをpushします。
add: スタックから値を2つpopして後にpopした値と先にpopした値を足した結果をpushします。
sub: スタックから値を2つpopして後にpopした値から先にpopした値を引いた結果をpushします。
mul: スタックから値を2つpopして後にpopした値と先にpopした値を掛けた結果をpushします。
div: スタックから値を2つpopして後にpopした値から先にpopした値を割った結果をpushします。
odd: スタックから値を1つpopしてその値が奇数であれば1を偶数であれば0をpushします。
eq: スタックから値を2つpopして後にpopした値と先にpopした値が等しければ1をそうでなければ0をpushします。
ls: スタックから値を2つpopして後にpopした値より先にpopした値が小さければ1をそうでなければ0をpushします。
gr: スタックから値を2つpopして後にpopした値より先にpopした値が大きければ1をそうでなければ0をpushします。
neq: スタックから値を2つpopして後にpopした値と先にpopした値が等しくなければ1をそうでなければ0をpushします。
lseq: スタックから値を2つpopして後にpopした値より先にpopした値が小さいか等しければ1をそうでなければ0をpushします。
greq: スタックから値を2つpopして後にpopした値より先にpopした値が大きいか等しければ1をそうでなければ0をpushします。
wrt: スタックから値を1つpopしてその値をコンソールに表示します。
wrl: コンソール出力を改行します。

jmp (ジャンプ)

jmp(インデックス)

pcの値をオペランドで指定したインデックスに変更して無条件ジャンプを実現します。

jpc (条件ジャンプ)

jpc(インデックス)

スタックから値をpopして、その値が0であればpcの値をオペランドで指定したインデックスに変更してジャンプします。

cal (コール)

cal(レベル, インデックス)

関数呼び出しの処理を実行します。オペランドのレベルは呼び出し元のレベルで呼び出し先のレベルはそれに1を足したものになります。呼び出し先のレベルのアドレスと現在のpcをスタックに退避した上で、pcの値をオペランドのインデックスに変更してジャンプします。レベルとpcの退避は関数の再帰呼び出しをする際に必要になります。

ret (リターン)

ret(レベル, パラメータ数)

関数からリターンする処理を実行します。スタックから関数の結果となる値をpopした後に、call時にスタックに退避したレベルとpcを復帰させて、パラメータの数だけトップを減少させます。その後関数の結果となる値をスタックにpushして実行を継続します。

ict (トップを増加させる)

ict(増加する数)

引数で指定した数だけスタックのトップを増加させます。トップを減少する単独の命令はありませんが、ret命令実行時に2番目のオペランドで指定したパラメータ数だけ減少されます。

開発環境の構築

今回は JavaScript の実行環境に Node.js を利用します。開発に利用した環境は以下の通りです。

Windows 10 Pro 64bit. バージョン 1909
Node.js 12.16.3 (nodist経由でインストール)
npm 6.14.4

また、開発には WebStorm 2020.1.1 を利用しました。

WebStorm を起動したら Create New Project をクリックします。テンプレートから Empty Project を選択し、プロジェクト名は pl0-prime-js としました。

Create をクリックするとプロジェクトが作成されます。Project ペインのプロジェクト名を右クリックして New から package.json File を作ります。

プロジェクト名に赤枠がつくので Enter を2回押してデフォルト状態で確定します。画面下のボタンからターミナルを開きます。

ターミナルに以下を入力して Babel と Jest をインストールします。Babel は最新の JavaScript 文法を利用できるするためのトランスパイラ、Jest は JavaScript のテスティングフレームワークです。

npm install --save-dev @babel/core @babel/preset-env jest

実行すると package.json が以下のように変わります。

package.json

{
  "name": "pl0-prime-js",
  "version": "1.0.0",
  "dependencies": {},
  "devDependencies": {
    "@babel/core": "^7.9.0",
    "@babel/preset-env": "^7.9.5",
    "jest": "^25.5.1"
  }
}

Babel の設定と Jest の設定を以下のように package.json に追加します。

package.json

{
  "name": "pl0-prime-js",
  "version": "1.0.0",
  "dependencies": {},
  "devDependencies": {
    "@babel/core": "^7.9.0",
    "@babel/preset-env": "^7.9.5",
    "jest": "^25.5.1"
  },
  "babel": {
    "presets": [
      [
        "@babel/preset-env",
        {
          "targets": {
            "node": "current"
          }
        }
      ]
    ]
  },
  "jest": {
    "testEnvironment": "node"
  }
}

これで JavaScript を実行できるようになりました。この記事では JavaScript のコードは Jest 経由で実行します。プロジェクトに test.js という名前のファイルを作成して、以下を入力してください。

test.js

test("jestが動く", () => {
  expect(2 * 3).toBe(6);
});

入力すると test の左側に緑色の三角形が表示されます。クリックして Run 'jestが動く' をクリックします。

画面下に Tests Passed: 1 of 1 test と表示されれば成功です。

念のため toBe(6) を toBe(5) にして失敗することも確認します。

expect で Missiong import statement と表示されるので @types/jest を追加します。もし WebStorm のためのパッケージを package.json に追加するのが不適切と思われる場合は、グローバルにインストールして設定してください。

npm install --save-dev @types/jest

これで警告が消えてコード補完も動くようになります。

さらにコードを自動でフォーマットするために prettier をインストールします。

npm install --save-dev prettier

インストールしたら、FileメニューのSettingを開いて、Language & Frameworks 内の JavaScript 以下に Prettier の設定で Run on save for files にチェックを入れます。これで保存時に自動でフォーマットがかかるようになります。

第1段階

PL/0' のコンパイラを作成するにあたり最初に作る部分を考える必要があります。いくつかやり方がありますが、今回は空のプログラムだけが書ける以下のサブセットから始めます。

program

作成するプログラムは、まずは文字列で表されたソースコードがあって、それをコンパイルすると仮想マシンのコードが返ってくる、コードを仮想マシンに入れると実行されて結果が返ってくるとしたいと思います。JavaScript のプログラムで表現すると次のようになります。

test.js

test("空のプログラムをコンパイルして実行できる", () => {
  const source = ".";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("");
});

最初からこれをすべて作るのは大変なのでまずは VirtualMachine から作ることにします。まず何もしないプログラムのコードですが、書籍の実装では次のように表現しています。

ret 0, 0

ret の第一オペランドは現在のレベルを表しています0レベルはプログラム実行開始時のレベルです。最後に ret を実行できるようにスタックにはあらかじめ初期値として [0, 0] を積んだ状態で開始します。retが実行されると、ディスプレイのレベル0に0をpcに0が入ります。仮想マシンは次に実行するコードのインデックスが0になったときに停止します。

ret 命令で少し紛らわしい点はプログラムで最初に実行されるコードのインデックスも0である点です。プログラムを最初から実行しようとして例えば jmp 0 を実行しようとすると、意図に反してプログラムが終了するため、インデックス0に置かれた命令は仮想マシンの実行が開始した最初にしか呼ぶことができないという暗黙の制約があります。しかし、書籍で出てくる PL/0' コンパイラが生成するコードでは最初の命令は必ずメイン処理への jmp になるためこのことは特に問題になりません。

もう一点紛らわしい点として、ret がスタックから結果となる値をポップする部分です。通常であれば関数の戻り値がスタックに置かれているのですが、メインの処理では値が積まれていない状態で ret が呼ばれます。その結果、スタックに初期値として積まれているプログラムカウンタの値 0 が戻り値として扱われます。一貫性を求めるのであればメインの ret の前に lit 0 を置くことで解決できますがなくても動くためこうなっているのだと解釈しました。

この ret 命令は JavaScript のオブジェクトで次のように表現することとします。

{ kind: "ret", level: 0, numParams: 0 }

これをもとに VirtualMachine クラスを作ります。

VirtualMachine.js

export class VirtualMachine {
  constructor(code) {
    this.code = code;
    this.output = "";
  }

  run() {
    this.pc = 0;
    this.stack = [0, 0];
    this.top = this.stack.length;
    this.display = [0];

    while (true) {
      const inst = this.code[this.pc++];
      switch (inst.kind) {
        case "ret":
          this.runRet(inst);
          break;
        default:
          throw new Error("不明な命令の種類");
      }

      // PCが0の場合実行を停止する
      if (this.pc === 0) {
        break;
      }
    }
  }

  runRet(inst) {
    // 結果を退避する
    const result = this.stack[this.top - 1];
    // top, display, level を呼び出し前に戻す
    this.top = this.display[inst.level];
    this.display[inst.level] = this.stack[this.top];
    this.pc = this.stack[this.top + 1];
    // 積んだ引数分 top を戻す
    this.top -= inst.numParams;
    // 結果をスタックに積む
    this.stack[this.top++] = result;
  }
}

動作確認のため以下のテストを追加します。

VirtualMachine.test.js

  test("メインブロックで実行するretが動く", () => {
    const code = [{ kind: "ret", level: 0, numParams: 0 }];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.pc).toBe(0);
    expect(vm.stack[0]).toBe(0);
    expect(vm.display[0]).toBe(0);
  });

次に compile ですが、こちらは現時点ではソース文字列が . であれば ret を返してそうでなければ構文エラーとする簡単なコードにとどめておきます。

Compiler.js

export function compile(source) {
  if (source !== ".") {
    throw new Error("構文エラー");
  }
  return [{ kind: "ret", level: 0, numParams: 0 }];
}

この章の最初に書いたテストは import 文を以下のように追加することで成功するようになります。

test.js

import { compile } from "./Compiler";
import { VirtualMachine } from "./VirtualMachine";

test("jestが動く", () => {
  expect(2 * 3).toBe(6);
});

test("空のプログラムをコンパイルして実行できる", () => {
  const source = ".";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("");
});

すべてのテストを一度に実行できるように、package.json の scripts に test を追加します。

pacakge.json

{
  ...
  "scripts": {
    "test": "jest"
  },
  ...
}

WebStome の Project で package.json を右クリックして Show npm scripts をクリックすると npm scripts が簡単に実行できるペインを開くことができます。test をダブルクリックすると以下のようにすべてのテストが PASS します。

第2段階

第1段階の文法に要素を追加して拡張していきます。今回は write 文を追加します。write 文を追加することで計算結果を出力できるようになります。本格的な言語だと hello, world. と表示したいところですが、PL/0' には数値しかないので代わりに 123 を表示することを目標にします。

program

block

statement

以下はこの言語で書けるプログラムの例です。

write 123.

第1段階と同様に以下のテストを作成します。

test.js

test("write 123. をコンパイルして実行できる", () => {
  const source = "write 123.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("123");
});

実際にトークンを読み取る必要があるので前章で省略した字句解析を作っていきます。トークンは次のようなオブジェクトで表現します。

write: { kind: "write" }
123: { kind: "number", value: 123 }
.: { kind: "." }

書籍では入力から順番にトークンを取り出す関数 nextToken を定義しています。この記事ではソースプログラムを表す Source クラスを作り、そこに nextToken メソッドを追加します。JavaScriptのコードで表すと次のようになります。

Source.test.js

  test("write 123. を字句解析できる", () => {
    const source = new Source("write 123.");
    expect(source.nextToken()).toEqual({ kind: "write" });
    expect(source.nextToken()).toEqual({ kind: "number", value: 123 });
    expect(source.nextToken()).toEqual({ kind: "." });
    expect(source.nextToken()).toEqual(null);
    expect(source.nextToken()).toEqual(null);
  });

Source クラスは以下のように定義しました。

Source.js

import {
  isDigit,
  isIdentifierChar,
  isIdentifierStartChar,
  isWhiteSpace,
  numberOfDigit,
} from "./Utils";

export class Source {
  constructor(str) {
    this.str = str;
    this.index = 0;
    this.nextChar();
  }

  /**
   * ストリームの次の文字を取り出す
   */
  nextChar() {
    if (this.index >= this.str.length) {
      this.ch = null;
    }
    this.ch = this.str[this.index++];
  }

  /**
   * 空白文字を読み飛ばす
   */
  skipSpaces() {
    while (isWhiteSpace(this.ch)) {
      this.nextChar();
    }
  }

  /**
   * ソース文字列の次のトークンを取り出す
   */
  nextToken() {
    // 空白文字を読み飛ばす
    this.skipSpaces();

    // 数字で始まるのは数値トークン
    if (isDigit(this.ch)) {
      return this.nextNumberToken();
    }

    // アルファベットで始めるのは識別子の場合とキーワードの場合がある
    if (isIdentifierStartChar(this.ch)) {
      return this.nextIdentifierOrKeywordToken();
    }
    if (this.ch === ".") {
      this.nextChar();
      return { kind: "." };
    }
    return null;
  }

  /**
   * 数値トークンを読んで返す
   */
  nextNumberToken() {
    let value = 0;
    while (isDigit(this.ch)) {
      value = value * 10 + numberOfDigit(this.ch);
      this.nextChar();
    }
    return { kind: "number", value };
  }

  /**
   * 識別子かキーワードのトークンを読んで返す
   */
  nextIdentifierOrKeywordToken() {
    let value = "";
    while (isIdentifierChar(this.ch)) {
      value += this.ch;
      this.nextChar();
    }
    if (value === "write") {
      return { kind: "write" };
    }
    return { kind: "identifier", value };
  }
}

コード中に出てくる isDigit や isWhiteSpace は Utils.js で次のように定義しました。

Utils.js

export function isWhiteSpace(ch) {
  return /^[ \n\r\t]$/.test(ch);
}

export function isDigit(ch) {
  return /^\d$/.test(ch);
}

export function isIdentifierStartChar(ch) {
  return /^[A-Za-z]$/.test(ch);
}

export function isIdentifierChar(ch) {
  return /^[A-Za-z0-9]$/.test(ch);
}

export function numberOfDigit(ch) {
  return ch.charCodeAt(0) - "0".charCodeAt(0);
}

字句解析の次は write 文に必要な仮想機械の命令を実装します。今回は lit と opr(wrt) を実装します。動いてほしい仮想機械のテストは次のようになります。

VirtualMachine.test.js

  test("opr(wrt)が動く", () => {
    const code = [
      { kind: "lit", value: 123 },
      { kind: "opr", operator: "wrt" },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.pc).toBe(0);
    expect(vm.output).toBe("123");
  });

実装は VirtualMachine クラスの run メソッド内の case を増やすことで行います。具体的な命令の実装は次のようになります。

VirtualMachine.js

  runLit(inst) {
    this.stack[this.top++] = inst.value;
  }

  runOpr(inst) {
    switch (inst.operator) {
      case "wrt":
        this.runOprWrt();
        break;
      default:
        throw new Error(`不明な命令: ${inst.operator}`);
    }
  }

  runOprWrt() {
    const value = this.stack[--this.top];
    this.output += String(value);
  }

最低限の字句解析が動くようになったので次はコンパイル処理を拡張します。第1段階では単純な compile 関数を定義しましたが、ここでは新しく Compiler クラスを追加します。

Compiler.js

export class Compiler {
  constructor(source) {
    this.source = source;
    this.nextToken();
  }

  nextToken() {
    this.token = this.source.nextToken();
  }

  compile() {
    this.code = [];
    this.compileBlock();
    if (this.token.kind !== ".") {
      throw new Error("構文エラー: 最後の.が足りない");
    }
    return this.code;
  }

  compileBlock() {
    this.compileStatement();
    this.code.push({ kind: "ret", level: 0, numParams: 0 });
  }

  compileStatement() {
    switch (this.token.kind) {
      case "write":
        this.nextToken();
        this.compileWrite();
        break;
      default:
        // 空文
        break;
    }
  }

  compileWrite() {
    if (this.token.kind !== "number") {
      throw new Error("構文エラー: writeの後ろに数値が必要");
    }
    this.code.push({ kind: "lit", value: this.token.value });
    this.code.push({ kind: "opr", operator: "wrt" });
    this.nextToken();
  }
}

そのうえで、前章で作成した compile 関数を次のように変更します。

Compiler.js

export function compile(str) {
  const source = new Source(str);
  const compiler = new Compiler(source);
  return compiler.compile();
}

これで章の最初に示した次のプログラムをコンパイルして実行できるようになりました。

write 123.

ここで少しだけコードを整理します。仮想マシンの命令を生成するときにこれまで直接オブジェクトを構築していましたが、代わりに CodeBuilder クラスを作成してその役割を移したいと思います。これにより今後拡張する過程で命令のフォーマットを変更する場合に修正箇所が明確になります」。

CodeBuilder.js

export class CodeBuilder {
  constructor() {
    this.code = [];
    this.currentIndex = 0;
  }

  getCode() {
    return this.code;
  }

  emit(code) {
    const index = this.currentIndex++;
    this.code[index] = code;
    return index;
  }

  emitRet(level, numParams) {
    return this.emit({ kind: "ret", level, numParams });
  }

  emitLit(value) {
    return this.emit({ kind: "lit", value });
  }

  emitOprWrt() {
    return this.emit({ kind: "opr", operator: "wrt" });
  }
}

こちらの CodeBuilder を利用することで Compiler は以下のようになります。

Compiler.js

export class Compiler {
  constructor(source) {
    this.source = source;
    this.nextToken();
  }

  nextToken() {
    this.token = this.source.nextToken();
  }

  compile() {
    this.codeBuilder = new CodeBuilder();
    this.compileBlock();
    if (this.token.kind !== ".") {
      throw new Error("構文エラー: 最後の.が足りない");
    }
    return this.codeBuilder.getCode();
  }

  compileBlock() {
    this.compileStatement();
    this.codeBuilder.emitRet(0, 0);
  }

  compileStatement() {
    switch (this.token.kind) {
      case "write":
        this.nextToken();
        this.compileWrite();
        break;
      default:
        // 空文
        break;
    }
  }

  compileWrite() {
    if (this.token.kind !== "number") {
      throw new Error("構文エラー: writeの後ろに数値が必要");
    }
    this.codeBuilder.emitLit(this.token.value);
    this.codeBuilder.emitOprWrt();
    this.nextToken();
  }
}

第3段階

次の段階として数式をコンパイルできるようにします。ここで完成する文法は次のようになります。

program

block

statement

expression

term

factor

この文法を使って書けるプログラムの例としては次のようなものがあります。

write 1 + 2 * 3.

計算ができるようになると一気にプログラミング言語らしくなります。それぞれの機能を網羅できるように以下のテストを追加します。

test.js

test("加算と乗算を含む式をコンパイルして実行できる", () => {
  const source = "write 1 + 2 * 3.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("7");
});

test("減算と括弧を含む式をコンパイルして実行できる", () => {
  const source = "write - (3 - 5).";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("2");
});

test("除算と単項プラスを含む式をコンパイルして実行できる", () => {
  const source = "write + 10 / 2.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("5");
});

test("演算子は左結合になっている", () => {
  const source = "write 10 - 5 - 3.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("2");
});

test("掛け算の右辺にマイナスを置ける", () => {
  const source = "write -2 * (-3).";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("6");
});

まずは追加で必要になった以下トークンに対応します。

文字からそれが単独でトークンになれるかどうかを判定する isSymbolToken 関数を作ります。

Source.js

const isSymbolToken = (() => {
  const symbolTable = {};
  [".", "+", "-", "*", "/", "(", ")"].forEach((str) => {
    symbolTable[str] = true;
  });
  return (str) => {
    return symbolTable[str] === true;
  };
})();

これを利用して nextToken を以下のように変更します。

Source.js

  nextToken() {
    // 空白文字を読み飛ばす
    this.skipSpaces();

    // 数字で始まるのは数値トークン
    if (isDigit(this.ch)) {
      return this.nextNumberToken();
    }

    // アルファベットで始めるのは識別子の場合とキーワードの場合がある
    if (isIdentifierStartChar(this.ch)) {
      return this.nextIdentifierOrKeywordToken();
    }

    // 単独でトークンになれる記号
    if (isSymbolToken(this.ch)) {
      const kind = this.ch;
      this.nextChar();
      return { kind };
    }

    return null;
  }

追加トークンに対するテストを以下のように追加します。

Source.tsst.js

  test("+ - * / ( ) . を字句解析できる", () => {
    const source = new Source("+ - * / ( ) .");
    expect(source.nextToken()).toEqual({ kind: "+" });
    expect(source.nextToken()).toEqual({ kind: "-" });
    expect(source.nextToken()).toEqual({ kind: "*" });
    expect(source.nextToken()).toEqual({ kind: "/" });
    expect(source.nextToken()).toEqual({ kind: "(" });
    expect(source.nextToken()).toEqual({ kind: ")" });
    expect(source.nextToken()).toEqual({ kind: "." });
    expect(source.nextToken()).toEqual(null);
  });

次に仮想マシンの命令を追加します。今回追加が必要になるのは、以下の4つです。

opr(neg)
opr(add)
opr(sub)
opr(mul)
opr(div)

基本的にどれも同じなのでコードは opr(add) だけ示します。

VirtualMachine.js

  runOprAdd() {
    const lhs = this.stack[--this.top];
    const rhs = this.stack[--this.top];
    this.stack[this.top++] = lhs + rhs;
  }

スタックの操作が書き間違えそうなので pushStack と popStack を追加して書き直します。

VirtualMachine.js

  pushStack(value) {
    this.stack[this.top++] = value;
  }

  popStack() {
    return this.stack[--this.top];
  }

  runOprAdd() {
    const rhs = this.popStack();
    const lhs = this.popStack();
    this.pushStack(lhs + rhs);
  }

以下のテストが動けば追加命令は想定通りに動いています。

VirtualMachine.js

  test("opr(neg)が動く", () => {
    const code = [
      { kind: "lit", value: 123 },
      { kind: "opr", operator: "neg" },
      { kind: "opr", operator: "wrt" },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.pc).toBe(0);
    expect(vm.output).toBe("-123");
  });

  test("opr(pls),opr(mns),opr(mul),opr(div)が動く", () => {
    function createVmAndRun(op) {
      const code = [
        { kind: "lit", value: 10 },
        { kind: "lit", value: 5 },
        { kind: "opr", operator: op },
        { kind: "opr", operator: "wrt" },
        { kind: "ret", level: 0, numParams: 0 },
      ];
      const vm = new VirtualMachine(code);
      vm.run();
      return vm;
    }
    expect(createVmAndRun("pls").output).toBe("15");
    expect(createVmAndRun("mns").output).toBe("5");
    expect(createVmAndRun("mul").output).toBe("50");
    expect(createVmAndRun("div").output).toBe("2");
  });

字句解析と仮想マシンの準備ができたのでコンパイラを実装します。再帰的下向き構文解析を利用して文法規則をほぼそのままコードとして書き下しています。

Compiler.js

  compileWrite() {
    this.compileExpression();
    this.codeBuilder.emitOprWrt();
  }

  compileExpression() {
    let shouldEmitNeg = false;
    // 単項演算子の処理
    switch (this.token.kind) {
      case "+":
        this.nextToken();
        break;
      case "-":
        shouldEmitNeg = true;
        this.nextToken();
        break;
      default:
        break;
    }
    this.compileTerm();
    if (shouldEmitNeg) {
      this.codeBuilder.emitOprNeg();
    }
    // 中置演算子の処理
    while (true) {
      if (this.token.kind === "+") {
        this.nextToken();
        this.compileTerm();
        this.codeBuilder.emitOprAdd();
        continue;
      }
      if (this.token.kind === "-") {
        this.nextToken();
        this.compileTerm();
        this.codeBuilder.emitOprSub();
        continue;
      }
      break;
    }
  }

  compileTerm() {
    this.compileFactor();
    while (true) {
      if (this.token.kind === "*") {
        this.nextToken();
        this.compileFactor();
        this.codeBuilder.emitOprMul();
        continue;
      }
      if (this.token.kind === "/") {
        this.nextToken();
        this.compileFactor();
        this.codeBuilder.emitOprDiv();
        continue;
      }
      break;
    }
  }

  compileFactor() {
    // 数値
    if (this.token.kind === "number") {
      this.codeBuilder.emitLit(this.token.value);
      this.nextToken();
      return;
    }
    // 括弧で囲まれた式
    if (this.token.kind === "(") {
      this.nextToken();
      this.compileExpression();
      if (this.token.kind !== ")") {
        throw new Error("構文エラー: 括弧が閉じていない");
      }
      this.nextToken();
      return;
    }
    // それ以外はエラー
    throw new Error("構文エラー");
  }

数式の計算ができることになって、ようやくプログラミング言語らしくなってきました。

第4段階

第4段階では定数宣言を追加します。定数宣言の追加で文法は次のように拡張されます。

program

block

constDecl

statement

expression

term

factor

今回の拡張で以下のプログラムが動くようになります。

const a = 3, b = 4;
write a * b.

今回は以下のテストが通るようになることを目標にします。

test.js

test("constで定数を宣言できる", () => {
  const source = "const a = 3, b= 4; write a * b.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("12");
});

test("constを複数書ける", () => {
  const source = "const a = 3; const b= 4; write a * b.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("12");
});

今回追加されるトークンは以下の4つです。

const
=
,
;

JavaScriptで表現すると以下のようになります。

Source.test.js

  test("const = , ; を字句解析できる", () => {
    const source = new Source("const = , ;");
    expect(source.nextToken()).toEqual({ kind: "const" });
    expect(source.nextToken()).toEqual({ kind: "=" });
    expect(source.nextToken()).toEqual({ kind: "," });
    expect(source.nextToken()).toEqual({ kind: ";" });
    expect(source.nextToken()).toEqual(null);
  });

const については以下の isKeyword 関数を作って組み込みます。

Source.js

const isKeyword = (() => {
  const keywordTable = {};
  ["write", "const"].forEach((str) => {
    keywordTable[str] = true;
  });
  return (str) => {
    return keywordTable[str] === true;
  };
})();

= と , と ; は isSymbolTable を書き換えて対応します。

定数を追加したことで名前を管理する必要が出てきたので名前表を作ります。以下のプログラムが動くことを目標にします。

NameTable.test.js

  test("定数の登録と取得ができる", () => {
    const nameTable = new NameTable();
    nameTable.addConst("a", 10);
    nameTable.addConst("b", 20);
    expect(nameTable.search("a").kind).toBe("const");
    expect(nameTable.search("a").value).toBe(10);
    expect(nameTable.search("b").kind).toBe("const");
    expect(nameTable.search("b").value).toBe(20);
  });

実装は次のようになります。

NameTable.js

export class NameTable {
  constructor() {
    this.table = [];
  }

  search(name) {
    for (let i = this.table.length - 1; i >= 0; i--) {
      const entry = this.table[i];
      if (entry.name === name) {
        return entry;
      }
    }
    return null;
  }

  addConst(name, value) {
    this.table.push({ kind: "const", name, value });
  }
}

これを利用してコンパイラを作成します。今回は仮想機械の命令追加はありません。NameTable クラスのインスタンスは compile メソッドの引数として外から受け取ることにします。

Compile.js

  compile(nameTable) {
    this.nameTable = nameTable;
    this.codeBuilder = new CodeBuilder();
    this.compileBlock();
    if (this.token.kind !== ".") {
      throw new Error("構文エラー: 最後の.が足りない");
    }
    return this.codeBuilder.getCode();
  }

compile 関数を次のように変更します。

Compile.js

export function compile(str) {
  const source = new Source(str);
  const compiler = new Compiler(source);
  const nameTable = new NameTable();
  return compiler.compile(nameTable);
}

NameTable のインスタンスを Compiler クラスの内部で生成するのではなく、外から受け取る理由は、将来的に言語を拡張する場合にその方が都合が良いからです。例えば標準ライブラリや外部のモジュールを呼び出す場合その情報は NameTable の初期値として入っていると考えると扱いやすくなります。

compileBlock を拡張して constDelc ルールをコンパイルできるようにします。

Compiler.js

 compileBlock() {
    while (true) {
      switch (this.token.kind) {
        case "const":
          this.nextToken();
          this.compileConstDecl();
          continue;
        default:
          break;
      }
      break;
    }
    this.compileStatement();
    this.codeBuilder.emitRet(0, 0);
  }

  compileConstDecl() {
    while (true) {
      if (this.token.kind !== "identifier") {
        throw new Error("構文エラー: 識別子が必要");
      }
      const name = this.token.value;
      this.nextToken();
      if (this.token.kind !== "=") {
        throw new Error("構文エラー: = が必要");
      }
      this.nextToken();
      if (this.token.kind !== "number") {
        throw new Error("構文エラー: 数値が必要");
      }
      const value = this.token.value;
      this.nameTable.addConst(name, value);
      this.nextToken();
      if (this.token.kind === ",") {
        this.nextToken();
        continue;
      }
      if (this.token.kind !== ";") {
        throw new Error("構文エラー: ; が必要");
      }
      this.nextToken();
      break;
    }
  }

さらに、compileFactor を拡張して変数を扱えるようにします。

Compile.js

    // 数値
    if (this.token.kind === "number") {
      this.codeBuilder.emitLit(this.token.value);
      this.nextToken();
      return;
    }
    // 識別子
    if (this.token.kind === "identifier") {
      const name = this.token.value;
      this.nextToken();
      this.compileIdentifier(name);
      return;
    }

ここで compileIdentifier は以下のように定義しています。

Compile.js

  compileIdentifier(name) {
    if (entry === null) {
      throw new Error(`名前 ${name} が未定義`);
    }
    const entry = this.nameTable.search(name);
    switch (entry.kind) {
      case "const":
        this.codeBuilder.emitLit(entry.value);
        break;
      default:
        throw new Error(`名前 ${name} が未定義`);
    }
  }

これで定数を宣言して使うプログラムをコンパイルして実行できるようになりました。

第5段階

ここでは begin end に対応します。begin end を導入すると複数の文を実行できるようになります。同時に writeln 文にも対応します。今回の文法は以下の通りです。

program

block

constDecl

statement

expression

term

factor

この文法で次のプログラムを書けるようになります。

const a = 2, b = 3;
begin
  write a + b;
  writeln;
  write a * b;
  writeln
end.

今回のプログラムの動作を確認するための以下のテストを追加します。

test.js

test("begin end で複数の文を実行できる", () => {
  const source = "begin write 12; writeln; write 34 end.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("12\n34");
});

まずは新しく出てくるトークンを実装します。今回の文法では以下の3つのキーワードが新しく出てきます。

begin
end
writeln

JavaScriptで表現すると以下のようになります。

Source.test.js

  test("begin end writeln を字句解析できる", () => {
    const source = new Source("begin end writeln");
    expect(source.nextToken()).toEqual({ kind: "begin" });
    expect(source.nextToken()).toEqual({ kind: "end" });
    expect(source.nextToken()).toEqual({ kind: "writeln" });
    expect(source.nextToken()).toEqual(null);
  });

いずれもキーワードなので isKeyword を拡張して実装します。

仮想マシンの命令で追加する必要があるのは opr(wrl) です。

VirtualMachine.test.js

  test("opr(wrl)が動く", () => {
    const code = [
      { kind: "opr", operator: "wrl" },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.pc).toBe(0);
    expect(vm.output).toBe("\n");
  });

VirtualMachine.js に処理を追加して実装します。

準備ができたのでコンパイル処理を実装します。今回追加した文法は compileStatement を拡張することで実装できます。

Compiler.js

 compileStatement() {
    switch (this.token.kind) {
      case "write":
        this.nextToken();
        this.compileWrite();
        break;
      case "writeln":
        this.nextToken();
        this.compileWriteLn();
        break;
      case "begin":
        this.nextToken();
        this.compileBeginEnd();
        break;
      default:
        // 空文
        break;
    }
  }

compileBeginEnd は以下のように定義しています。

Compiler.js

  compileBeginEnd() {
    while (true) {
      this.compileStatement();
      if (this.token.kind === ";") {
        this.nextToken();
        continue;
      }
      if (this.token.kind !== "end") {
        throw new Error("構文エラー: end がない");
      }
      this.nextToken();
      break;
    }
  }

これで begin end が動くようになり複数の文を実行できるようになりました。

ここで少しだけコードを整理します。上のコードの

Compiler.js

      if (this.token.kind !== "end") {
        throw new Error("構文エラー: end がない");
      }
      this.nextToken();

のようなパターンは繰り返し出てくるので ensureToken という名前でまとめてしまいます。

Compiler.js

  ensureToken(kind) {
    const token = this.token;
    if (token.kind !== kind) {
      throw new Error(`構文エラー: ${kind} が必要`);
    }
    this.nextToken();
    return token;
  }

ensureToken を利用すると compileConstDecl は以下のようにすっきり書けます。

Compiler.js

  compileConstDecl() {
    while (true) {
      const name = this.ensureToken("identifier").value;
      this.ensureToken("=");
      const value = this.ensureToken("number").value;
      this.nameTable.addConst(name, value);
      if (this.token.kind === ",") {
        this.nextToken();
        continue;
      }
      this.ensureToken(";");
      break;
    }
  }

第6段階

ここでは文法のサブセットに if 文と条件式を追加します。文法全体は以下のようになります。

program

block

constDecl

statement

condition

expression

term

factor

この文法では次のプログラムを書けるようになります。

const x = 10;
begin
  if x > 5 then begin
    write 123;
    writeln
  end;
  write 456
end.

この章では以下のテストが通るようになる予定です。

test.js

test("比較演算子をコンパイルして実行できる", () => {
  const compileAndRun = (operator) => {
    const source = `
      const a = 2, b = 3;
      begin
        if a ${operator} b then write 1;
        if b ${operator} a then write 2;
        if a ${operator} a then write 3
      end.`;
    const code = compile(source);
    const vm = new VirtualMachine(code);
    vm.run();
    return vm.output;
  };
  expect(compileAndRun("<")).toBe("1");
  expect(compileAndRun("<=")).toBe("13");
  expect(compileAndRun(">")).toBe("2");
  expect(compileAndRun(">=")).toBe("23");
  expect(compileAndRun("=")).toBe("3");
  expect(compileAndRun("<>")).toBe("12");
});

test("oddをコンパイルして実行できる", () => {
  const source = `
    const a = 2, b = 3;
    begin
      if odd a then write 1;
      if odd b then write 2
    end.`;
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("2");
});

test("if のサンプルをコンパイルして実行できる", () => {
  const source = `
    const x = 10;
    begin
      if x > 5 then begin
        write 123;
        writeln
      end;
      write 456
    end.`;
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("123\n456");
});

まずは追加になるトークンに対応します。今回追加するトークンは以下の通りです。

if
then
odd
=
<>
<
<=
>
>=

以下のテストが通れば正しく字句解析できています。

Source.test.js

  test("if then odd を字句解析できる", () => {
    const source = new Source("if then odd");
    expect(source.nextToken()).toEqual({ kind: "if" });
    expect(source.nextToken()).toEqual({ kind: "then" });
    expect(source.nextToken()).toEqual({ kind: "odd" });
    expect(source.nextToken()).toEqual(null);
  });

  test("< > <= >= <> = を字句解析できる", () => {
    const source = new Source("< > <= >= <> =");
    expect(source.nextToken()).toEqual({ kind: "<" });
    expect(source.nextToken()).toEqual({ kind: ">" });
    expect(source.nextToken()).toEqual({ kind: "<=" });
    expect(source.nextToken()).toEqual({ kind: ">=" });
    expect(source.nextToken()).toEqual({ kind: "<>" });
    expect(source.nextToken()).toEqual({ kind: "=" });
    expect(source.nextToken()).toEqual(null);
  });

初めて記号2つで表すトークンが登場しました。こちらは nextToken 内で次のように実装します。

Source.js

    // 複数の記号で一つのトークンになるもの
    if (this.ch === "<") {
      this.nextChar();
      if (this.ch === "=") {
        this.nextChar();
        return { kind: "<=" };
      }
      if (this.ch === ">") {
        this.nextChar();
        return { kind: "<>" };
      }
      return { kind: "<" };
    }
    if (this.ch === ">") {
      this.nextChar();
      if (this.ch === "=") {
        this.nextChar();
        return { kind: ">=" };
      }
      return { kind: ">" };
    }

次に追加が必要になる仮想マシンの命令は以下の通りです。

opr(eq)
opr(lt)
opr(gr)
opr(neq)
opr(lteq)
opr(greq)
opr(odd)
jpc

まず opr については以下のテストが通るように実装します。

Compiler.js

  test("opr(eq),opr(lt),opr(gr),opr(neq),opr(lteq),opr(greq)が動く", () => {
    function createVmAndRun(v1, op, v2) {
      const code = [
        { kind: "lit", value: v1 },
        { kind: "lit", value: v2 },
        { kind: "opr", operator: op },
        { kind: "opr", operator: "wrt" },
        { kind: "ret", level: 0, numParams: 0 },
      ];
      const vm = new VirtualMachine(code);
      vm.run();
      return vm;
    }
    expect(createVmAndRun(2, "eq", 2).output).toBe("1");
    expect(createVmAndRun(2, "eq", 3).output).toBe("0");
    expect(createVmAndRun(2, "lt", 3).output).toBe("1");
    expect(createVmAndRun(3, "lt", 2).output).toBe("0");
    expect(createVmAndRun(2, "lt", 2).output).toBe("0");
    expect(createVmAndRun(2, "gr", 3).output).toBe("0");
    expect(createVmAndRun(3, "gr", 2).output).toBe("1");
    expect(createVmAndRun(2, "gr", 2).output).toBe("0");
    expect(createVmAndRun(2, "neq", 2).output).toBe("0");
    expect(createVmAndRun(2, "neq", 3).output).toBe("1");
    expect(createVmAndRun(2, "lteq", 3).output).toBe("1");
    expect(createVmAndRun(3, "lteq", 2).output).toBe("0");
    expect(createVmAndRun(2, "lteq", 2).output).toBe("1");
    expect(createVmAndRun(2, "greq", 3).output).toBe("0");
    expect(createVmAndRun(3, "greq", 2).output).toBe("1");
    expect(createVmAndRun(2, "greq", 2).output).toBe("1");
  });

  test("opr(odd)が動く", () => {
    const code = [
      { kind: "lit", value: 2 },
      { kind: "opr", operator: "odd" },
      { kind: "opr", operator: "wrt" },
      { kind: "lit", value: 3 },
      { kind: "opr", operator: "odd" },
      { kind: "opr", operator: "wrt" },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.output).toBe("01");
  });

jpc 命令のテストは次のようになります。

VirtualMachine.test.js

  test("jpcが動く", () => {
    const code = [
      { kind: "lit", value: 0 },
      { kind: "jpc", index: 4 },
      { kind: "lit", value: 123 },
      { kind: "opr", operator: "wrt" },
      { kind: "lit", value: 456 },
      { kind: "opr", operator: "wrt" },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.output).toBe("456");
  });

jpc 命令のオペランドはジャンプ先のインデックスを表します。上の例では index: 4 は以下の行を表しています。

{ kind: "lit", value: 456 },

if 文のコンパイル処理を書きやすくするために CodeBuilder に backPatch メソッドを追加します。backPatch 命令は指定したインデックスの命令のオペランドのインデックスを後から書き換えるのに使います。

CodeBuilder.js

  backPatch(jmpInstIndex, newIndex) {
    this.code[jmpInstIndex].index = newIndex;
  }

準備ができたのでコンパイルの処理を書き始めます。まずは条件式を compileCondition という名前で実装します。compileStatement 内の switch 文に if を追加して、以下の compileIf を呼び出します。

Compiler.js

  compileIf() {
    this.compileCondition();
    this.ensureToken("then");
    const jpcInst = this.codeBuilder.emitJpc(0); // 0は仮のインデックス
    this.compileStatement();
    this.codeBuilder.backPatch(jpcInst, this.codeBuilder.currentIndex);
  }

これで if 文のコンパイルと実行が動くようになります。

第7段階

ここでは文法のサブセットに変数宣言と代入を追加します。文法全体は以下のようになります。

program

block

constDecl

varDecl

statement

condition

expression

term

factor

この文法では次のようなプログラムを書くことができます。

var x;
begin
  x := 2;
  x := x * x;
  x := x * x;
  write x;
  writeln
end.

この章では以下のテストが通るようになる予定です。

test.js

test("代入文をコンパイルして実行できる", () => {
  const source = "var x; begin x := 10; write x * 2 end.";
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("20");
});

今回追加になるトークンは以下の2つです。

var
:=

Source.js を拡張して以下のテストが通るようにします。

Source.test.js

  test("var := を字句解析できる", () => {
    const source = new Source("var :=");
    expect(source.nextToken()).toEqual({ kind: "var" });
    expect(source.nextToken()).toEqual({ kind: ":=" });
    expect(source.nextToken()).toEqual(null);
  });

仮想マシンを拡張して以下の4つの命令を追加していきます。

ict
jmp
sto
lod

ict 命令はスタックのtopを増やす命令です。

VirtualMachine.js

  runIct(inst) {
    this.top += inst.value;
  }

jmp 命令はプログラムカウンタをオペランドで指定したインデックスに変更します。

VirtualMachine.js

  runJmp(inst) {
    this.pc = inst.index;
  }

lod 命令はレベルと相対アドレスで計算したアドレスの値を読み取ります。

VirtualMachine.js

  runLod(inst) {
    const address = this.display[inst.level] + inst.relAddress;
    const value = this.stack[address];
    this.pushStack(value);
  }

sto 命令はレベルと相対アドレスで計算したアドレスに値を書き込みます。

VirtualMachine.js

  runSto(inst) {
    const value = this.popStack();
    const address = this.display[inst.level] + inst.relAddress;
    this.stack[address] = value;
  }

これらの命令を使うと以下のテストが動きます。

VirtualMachine.test.js

  test("jmpが動く", () => {
    const code = [
      { kind: "jmp", index: 3 },
      { kind: "lit", value: 123 },
      { kind: "opr", operator: "wrt" },
      { kind: "lit", value: 456 },
      { kind: "opr", operator: "wrt" },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.output).toBe("456");
  });

  test("ict, sto, lod が動く", () => {
    const code = [
      { kind: "ict", value: 2 },
      { kind: "lit", value: 10 },
      { kind: "sto", level: 0, relAddress: 2 }, // 2で始まるのはこの時点でスタックに2つ積んであるため
      { kind: "lit", value: 20 },
      { kind: "sto", level: 0, relAddress: 3 },
      { kind: "lod", level: 0, relAddress: 2 },
      { kind: "lod", level: 0, relAddress: 3 },
      { kind: "opr", operator: "add" },
      { kind: "opr", operator: "wrt" },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.output).toBe("30");
  });

次に名前表で変数を扱えるように拡張します。

NameTable.js

export class NameTable {
  constructor() {
    this.table = [];
    this.localAddress = 2;
    this.level = 0;
  }

  search(name) {
    for (let i = this.table.length - 1; i >= 0; i--) {
      const entry = this.table[i];
      if (entry.name === name) {
        return entry;
      }
    }
    return null;
  }

  addConst(name, value) {
    this.table.push({ kind: "const", name, value });
  }

  addVar(name) {
    const relAddress = this.localAddress++;
    this.table.push({ kind: "var", name, level: this.level, relAddress });
    return relAddress;
  }
}

addVar を呼び出すたびに localAddress がインクリメントされて割り当てられたアドレスが戻り値として返ります。

準備ができたので変数のコンパイル処理を作っていきます。まずは compileVarDecl を書いていきます。

Compiler.js

  compileVarDecl() {
    while (true) {
      const name = this.ensureToken("identifier").value;
      this.nameTable.addVar(name);
      if (this.token.kind === ",") {
        this.nextToken();
        continue;
      }
      this.ensureToken(";");
      break;
    }
  }

次に compileIdentifier を拡張して変数に対応させます。

Compile.js

  compileIdentifier(name) {
    const entry = this.nameTable.search(name);
    if (entry === null) {
      throw new Error(`名前 ${name} が未定義`);
    }
    switch (entry.kind) {
      case "const":
        this.codeBuilder.emitLit(entry.value);
        break;
      case "var":
        this.codeBuilder.emitLod(entry.level, entry.relAddress);
        break;
      default:
        throw new Error(`名前 ${name} が未定義`);
    }
  }

さらに、代入文をコンパイルする compileAssign を次のように定義します。

Compiler.js

  compileAssign(name) {
    const entry = this.nameTable.search(name);
    if (entry === null) {
      throw new Error(`名前 ${name} が未定義`);
    }
    if (entry.kind !== "var") {
      throw new Error("意味エラー: 代入文の左辺が変数ではない");
    }
    this.ensureToken(":=");
    this.compileExpression();
    this.codeBuilder.emitSto(entry.level, entry.relAddress);
  }

変数を使うためにはスタック上に値を保存する場所を確保する必要があります。compileBlock を拡張してプログラム開始時に領域を確保するようにします。

Compiler.js

  compileBlock() {
    while (true) {
      switch (this.token.kind) {
        case "const":
          this.nextToken();
          this.compileConstDecl();
          continue;
        case "var":
          this.nextToken();
          this.compileVarDecl();
          continue;
        default:
          break;
      }
      break;
    }
    this.codeBuilder.emitIct(this.nameTable.localAddress);
    this.compileStatement();
    this.codeBuilder.emitRet(0, 0);
  }

さらに VirtualMachine の run メソッドで top の初期値を 0 に設定します。

VirtualMachine.js

  run() {
    this.pc = 0;
    this.stack = [0, 0];
    this.top = 0;
    this.display = [0];

ここの top を 0 にしたことで VirtualMachine.test.js の各テストの修正が必要になります。具体的には最初に ict 命令を追加する必要があります。例えば最初のころに作った「メインブロックで実行するretが動く」は次のようになります。

VirtualMachine.test.js

  test("メインブロックで実行するretが動く", () => {
    const code = [
      { kind: "ict", value: 2 },
      { kind: "ret", level: 0, numParams: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.pc).toBe(0);
    expect(vm.stack[0]).toBe(0);
    expect(vm.display[0]).toBe(0);
  });

この修正が終わると代入が正しく動作するようになり、テストもすべて通ります。

第8段階

ここではサブセットに while 文を追加します。文法は次のようになります。

program

block

constDecl

varDecl

statement

condition

expression

term

factor

この文法で作るれるプログラムとしては次のようのなものがあります。

var i;
begin
  i := 5;
  while i > 0 do begin
    write i;
    writeln;
    i := i - 1
  end
end.

今回はこれをテストケースとして利用します。

test.js

test("while文をコンパイルして実行できる", () => {
  const source = `
    var i;
    begin
      i := 5;
      while i > 0 do begin
        write i;
        writeln;
        i := i - 1
      end
    end.`;
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("5\n4\n3\n2\n1\n");
});

今回追加になるトークン while と do を追加したら、コンパイルの処理を書きます。

Compiler.js

  compileWhile() {
    const startIndex = this.codeBuilder.currentIndex;
    this.compileCondition();
    this.ensureToken("do");
    const jpcInst = this.codeBuilder.emitJpc(0); // 0は仮のインデックス
    this.compileStatement();
    this.codeBuilder.emitJmp(startIndex);
    this.codeBuilder.backPatch(jpcInst, this.codeBuilder.currentIndex);
  }

これで while 文の実装は完了です。追加でフィボナッチ数列の n 項目を計算するプログラムを書いてみます。

const n = 10;
var x0, x1, i, tmp;
begin
  i := n;
  x0 := 0;
  x1 := 1;
  while i > 0 do begin
    tmp := x0 + x1;
    x0 := x1;
    x1 := tmp;
    i := i - 1
  end;
  write x1;
  writeln
end.

以下のテストを追加して実行すると成功します。

test.js

test("フィボナッチ数列の例をコンパイルして実行できる", () => {
  const source = `
    const n = 10;
    var x0, x1, i, tmp;
    begin
      i := n;
      x0 := 0;
      x1 := 1;
      while i > 0 do begin
        tmp := x0 + x1;
        x0 := x1;
        x1 := tmp;
        i := i - 1
      end;
      write x1;
      writeln
    end.`;
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("89\n");
});

第9段階

ここでは最後に残った関数宣言と関数呼び出しの機能を実装します。この2つの実装が終わると PL/0' のすべての文法機能を実装したことになります。

program (プログラム)

block (ブロック)

constDecl (定数宣言)

varDevl (変数宣言)

funcDecl (関数宣言)

statement (文)

condition

expression

term

factor

この文法で書けるプログラム例としては以下のものがあります。

function f(x)
var y;
begin
  y := x * 2;
  return y
end;
write f(10).

このプログラムを以下のテストコードで表現します。

test.js

test("function宣言をコンパイルして実行できる", () => {
  const source = `
    function f(x)
    var y;
    begin
      y := x * 2;
      return y
    end;
    write f(10).`;
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("20");
});

今回追加になるトークン function と return を追加したら、仮想マシンの命令を追加します。今回追加する命令は cal 命令です。cal 命令は次のように定義します。

VirtualMachine.js

  runCal(inst) {
    const level = inst.level + 1;
    // displayとpcの退避
    this.stack[this.top] = this.display[level];
    this.stack[this.top + 1] = this.pc;
    // displayとpcの更新
    this.display[level] = this.top;
    this.pc = inst.index;
  }

この動作のテストをを次のように追加します。

VirtualMachine.test.js

  test("calが動く", () => {
    const code = [
      { kind: "ict", value: 2 },
      { kind: "cal", level: 0, index: 2 },
      { kind: "jmp", index: 0 },
    ];
    const vm = new VirtualMachine(code);
    vm.run();
    expect(vm.stack).toEqual([0, 0, undefined, 2]);
  });

stackの3要素目はレベル1の古い値になるので今回の実装では undefined が入ります。4要素目の2が cal を呼び出した時の pc の値になります。

次に名前表を関数に対応させます。関数のコンパイルで名前表に登録すべき情報は、関数名のほかにパラメータ名があります。ここで追加するメソッドは以下の通りです。

addFunction: 関数の情報を追加する
addParameter: 関数のパラメータを追加する
endParameters: パラメータの最後であることを通知して関数の登録を更新する
endFunction: 関数の最後であることを通知してローカル変数のエントリを削除する

それぞれ以下のように実装します。

NameTable.js

  addFunction(name, index) {
    this.currentFunctionAddress = this.table.length;
    this.table.push({
      kind: "function",
      name,
      level: this.level++,
      index,
      numParams: 0, // 仮の値。 addParameter で更新する
    });
    // 現在のレベルの状態を保存する
    this.display.push({
      address: this.table.length,
      localAddress: this.localAddress,
    });
    this.localAddress = 2;
    return this.currentFunctionAddress;
  }

  addParameter(name) {
    const address = this.table.length;
    this.table.push({
      kind: "parameter",
      name,
      level: this.level,
      relAddress: 0, // 仮の値。 endParams で更新する
    });
    this.table[this.currentFunctionAddress].numParams++;
    return address;
  }

  endParameters() {
    const numParams = this.table[this.currentFunctionAddress].numParams;
    for (let i = 1; i <= numParams; i++) {
      this.table[this.currentFunctionAddress + 1].relAddress =
        i - 1 - numParams; // -1, -2, ... とつけていく
    }
  }

  endFunction() {
    const { address, localAddress } = this.display.pop();
    // 下位レベルで追加されたエントリを削除する
    this.table.length = address;
    this.localAddress = localAddress;
    this.level--;
  }

これらを利用して関数のコンパイル compileFunctionDecl を作ります。

Compiler.js

  compileFunctionDecl() {
    const functionName = this.ensureToken("identifier").value;
    this.nameTable.addFunction(functionName, this.codeBuilder.currentIndex);
    this.ensureToken("(");
    while (true) {
      const name = this.ensureToken("identifier").value;
      this.nameTable.addParameter(name);
      if (this.token.kind === ",") {
        continue;
      }
      this.ensureToken(")");
      break;
    }
    this.nameTable.endParameters();
    this.compileBlock();
    this.nameTable.endFunction();
    this.ensureToken(";");
  }

さらに compileBlock を修正して、ブロックの最初で関数のコードを飛び越すための jmp 文を配置します。

Compiler.js

  compileBlock() {
    const jmpInst = this.codeBuilder.emitJmp(0);
    while (true) {
      switch (this.token.kind) {
        case "const":
          this.nextToken();
          this.compileConstDecl();
          continue;
        case "var":
          this.nextToken();
          this.compileVarDecl();
          continue;
        case "function":
          this.nextToken();
          this.compileFunctionDecl();
          continue;
        default:
          break;
      }
      break;
    }
    this.codeBuilder.backPatch(jmpInst, this.codeBuilder.currentIndex);
    this.codeBuilder.emitIct(this.nameTable.localAddress);
    this.compileStatement();
    this.codeBuilder.emitRet(0, 0);
  }

次に compileIdentifier を修正して、パラメータの参照と関数呼び出しに対応します。

Compiler.js

  compileIdentifier(name) {
    const entry = this.nameTable.search(name);
    if (entry === null) {
      throw new Error(`名前 ${name} が未定義`);
    }
    switch (entry.kind) {
      case "const":
        this.codeBuilder.emitLit(entry.value);
        break;
      case "var":
        this.codeBuilder.emitLod(entry.level, entry.relAddress);
        break;
      case "parameter":
        this.codeBuilder.emitLod(entry.level, entry.relAddress);
        break;
      case "function":
        this.compileCall(entry);
        break;
      default:
        throw new Error(`名前 ${name} が未定義`);
    }
  }

ここで compileCall は次のように定義します。

Compiler.js

  compileCall(entry) {
    this.ensureToken("(");
    while (true) {
      if (this.token.kind === ")") {
        this.nextToken();
        break;
      }
      this.compileExpression();
      if (this.token.kind === ",") {
        this.nextToken();
        continue;
      }
      this.ensureToken(")");
      break;
    }
    this.codeBuilder.emitCal(entry.level, entry.index);
  }

さらに return 文のコンパイルを次のように追加します。

Compiler.js

  compileRet() {
    this.compileExpression();
    this.codeBuilder.emitRet(
      this.nameTable.level,
      this.nameTable.currentFunctionNumParams()
    );
  }

currentFunctionNumParams は NameTable に追加したメソッドです。

NameTable.js

  currentFunctionNumParams() {
    return this.table[this.currentFunctionAddress].numParams;
  }

これで関数の呼び出しが動くようになります。ここで念のためいくつかのパターンを追加で試します。まずは再帰呼び出しが動くかを確認します。

test.js

test("再帰呼び出しが正しく動く", () => {
  const source = `
    function fact(n)
    begin
      if n = 0 then return 1;
      return n * fact(n - 1)
    end;
    write fact(5).`;
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("120");
});

内側の関数で同じ名前の別の変数を定義しても問題ないかを試します。

test.js

test("変数のシャドウイングが正しく動く", () => {
  const source = `
    var x;
    function f1(y)
    begin
      x := y + 1;
      return x
    end;
    function f2(y)
    var x;
    begin
      x := y + 1;
      return x
    end;
    begin
      write f1(10);
      writeln;
      write f2(20);
      writeln;
      write x;
      writeln
    end.`;
  const code = compile(source);
  const vm = new VirtualMachine(code);
  vm.run();
  expect(vm.output).toBe("11\n21\n11\n");
});

第10段階

最後に書籍の第7章で実現されている誤りの回復と解析結果の出力を実装します。先に解析結果を出力するために出力を保持する ParseReport クラスを作成します。

ParseReport.js

export class ParseReport {
  constructor() {
    this.report = "";
    this.errorCount = 0;
  }

  outputWhiteSpace(ch) {
    if (ch === "\n") {
      this.report += "<br>\n";
      return;
    }
    this.report += ch;
  }

  outputToken(token) {
    switch (token.kind) {
      case "identifier":
        this.report += token.value;
        break;
      case "number":
        this.report += token.value;
        break;
      default:
        this.report += token.kind;
        break;
    }
  }

  typeError(str) {
    this.report += `<span class="pl0-type">${str}</span>`;
    this.checkErrorCount();
  }

  insert(str) {
    this.report += `<span class="pl0-insert">${str}</span>`;
    this.checkErrorCount();
  }

  delete(token) {
    this.report += `<span class="pl0-delete">`;
    this.outputToken(token);
    this.report += `</span>`;
  }

  missionOp() {
    this.report += `<span class="pl0-missionOp">@</span>`;
  }

  checkErrorCount() {
    if (++this.errorCount > 30) {
      throw new Error("too many error");
    }
  }
}

書籍のコードを参考に上のメソッドを呼び出し、対応するテストをいくつか追加しました。

test.js

test("定数宣言でコンマがない場合に回復できる", () => {
  const source = new Source("const x = 2 y = 3; write x + y.");
  const compiler = new Compiler(source);
  const nameTable = new NameTable();
  compiler.compile(nameTable);
  expect(compiler.report.report).toMatch(/<span class="pl0-insert">,<\/span>/);
});

test("begin end でセミコロンが足りない場合に回復できる", () => {
  const source = new Source("begin write 10 write 20 end.");
  const compiler = new Compiler(source);
  const nameTable = new NameTable();
  compiler.compile(nameTable);
  expect(compiler.report.report).toMatch(/<span class="pl0-insert">;<\/span>/);
});

test("begin end で不要な記号を読み飛ばす", () => {
  const source = new Source("begin write 10 := end.");
  const compiler = new Compiler(source);
  const nameTable = new NameTable();
  compiler.compile(nameTable);
  expect(compiler.report.report).toMatch(/<span class="pl0-delete">:=<\/span>/);
});

test("因子で演算子がない場合に回復できる", () => {
  const source = new Source("const a = 3; write 2a.");
  const compiler = new Compiler(source);
  const nameTable = new NameTable();
  compiler.compile(nameTable);
  expect(compiler.report.report).toMatch(/pl0-missingOp/);
});

test("変数が未定義の時にに回復できる", () => {
  const source = new Source("write x.");
  const compiler = new Compiler(source);
  const nameTable = new NameTable();
  compiler.compile(nameTable);
  expect(compiler.report.report).toMatch(
    /<span class="pl0-type">undef<\/span>/
  );
});

実際にエラー処理を改善していく場合、もう少し気軽に動作を確認できる方が都合が良いので、今回作成したコンパイラをHTMLページに組み込みます。

npm コマンドで parcel と bootstrap を導入します。

npm install --save-dev parcel-bundler

npm install --save bootstrap jquery popper.js

プロジェクトに index.html という名前で以下の内容のファイルを追加します。

index.html

<!DOCTYPE html>
<html lang="ja">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">
    <title>PL/0'コンパイラ</title>
    <link rel="stylesheet" href="./style.css">
</head>
<body>
    <div class="container-fluid">
        <div class="row">
            <div class="col-lg mt-2">
                <p class="h6">ソースコード</p>
                <div class="form-group">
                    <textarea class="form-control text-monospace" id="sourceTextArea" rows="10"></textarea>
                </div>
                <button class="btn btn-primary" id="compileButton">コンパイル</button>
                <p class="h6 mt-2">実行結果</p>
                <pre class="border" id="outputArea"></pre>
            </div>
            <div class="col-lg mt-2">
                <p class="h6">コンパイル結果</p>
                <pre class="border h-100" id="parseReport"></pre>
            </div>
        </div>
        <script src="./index.js"></script>
    </div>
</body>
</html>

さらに以下の内容の index.js を追加します。

index.js

import "bootstrap";
import "bootstrap/dist/css/bootstrap.css";
import { Source } from "./Source";
import { Compiler } from "./Compiler";
import { NameTable } from "./NameTable";
import { VirtualMachine } from "./VirtualMachine";

const sourceTextArea = document.getElementById("sourceTextArea");
const compileButton = document.getElementById("compileButton");
const outputArea = document.getElementById("outputArea");
const parseReport = document.getElementById("parseReport");

compileButton.addEventListener("click", () => {
  parseReport.innerText = "";
  outputArea.innerText = "";
  try {
    const source = new Source(sourceTextArea.value);
    const compiler = new Compiler(source);
    const nameTable = new NameTable();
    const code = compiler.compile(nameTable);
    parseReport.innerHTML = compiler.report.report;
    if (compiler.report.errorCount === 0) {
      const vm = new VirtualMachine(code);
      vm.run();
      outputArea.innerText = vm.output;
    }
  } catch (err) {
    outputArea.innerText = err.toString();
  }
});

さらに package.json の scripts を次のようにします。

package.json

  "scripts": {
    "start": "parcel index.html",
    "test": "jest"
  },

start を実行するとサーバが立ち上がります。書籍のプログラム 7.3 を入力すると以下のように表示されます。

エラーがない場合はコードが実行されます。

まとめ

10のステップに分けて PL/0' を JavaScript で実装しました。最後に今回行った段階分けをまとめます。書籍とこの表があれば、これまでの長い説明を一切読まなくても、同じ内容を再現したり、ほかの言語でコンパイラを実装したりできると思います。

段階	概要
第1段階	空のプログラムををコンパイルして実行できるようにする
第2段階	write文ををコンパイルして実行できるようにする
第3段階	数式をコンパイルして実行できるようにする
第4段階	定数宣言をコンパイルして実行できるようにする
第5段階	begin/endとwriteln文をコンパイルして実行できるようにする
第6段階	if文と条件式をコンパイルして実行できるようにする
第7段階	変数宣言と代入文をコンパイルして実行できるようにする
第8段階	while文をコンパイルして実行できるようにする
第9段階	関数宣言と関数呼び出しをコンパイルして実行できるようにする
第10段階	誤りの回復と解析結果の出力を実装する

参考: 公式サンプルの動かし方

PL/0' の実装はオーム社の公式サイトで配布されています。公式サイトではC言語以外にJavaやRubyの実装も配布されています。

書籍の公式サイト

C言語版は解凍したディレクトリの PL0compiler 内にあります。こちらのサンプルそのままでは動かない可能性があるので私の試した直し方を説明します。

make しようとして以下のエラーが表示される場合。

make: *** No targets.  Stop.

Makefile 内のルールのコロンの前にタブ文字が入っています。これを取り除くと make できます。

変更前

.SUFFIXES	: .o .c

.c.o	: 
	$(CC) $(CFLAGS) -c $<

pl0d	: ${OBJS}
	$(CC) -o $@ ${OBJS}

clean	:
	\rm -rf *~ *.o

変更後

.SUFFIXES: .o .c

.c.o: 
	$(CC) $(CFLAGS) -c $<

pl0d: ${OBJS}
	$(CC) -o $@ ${OBJS}

clean:
	\rm -rf *~ *.o

make できた後で ex1.pl0 をコンパイルしても正しく動いているように見えない場合以下を行うと動きます。

改行コードが CRLF になっている場合は LF に統一する
ファイルの先頭に BOM が入っている場合は取り除く

ex1.pl0 が正常に実行された場合コンソール画面は次のようになります。

enter source file name
ex1.pl0
start compilation
1 errors
start execution
7 85
595

また fact.pl0 は次のように出力されます。

enter source file name
fact.pl0
start compilation
1 errors
start execution
1 1
2 2
3 6
4 24
5 120
6 720
7 5040
8 40320
9 362880

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

書籍「コンパイラ 作りながら学ぶ」を読みながら JavaScript でコンパイラを実装してみた

はじめに

対象読者

PL/0' 言語

仮想マシン

記憶領域

命令

開発環境の構築

第1段階

第2段階

第3段階

第4段階

第5段階

第6段階

第7段階

第8段階

第9段階

第10段階

まとめ

参考: 公式サンプルの動かし方

書籍「コンパイラ作りながら学ぶ」を読みながら JavaScript でコンパイラを実装してみた