正規表現をクリックで組み立てるビジュアルビルダーを実装 — トークンモデルとライブハイライトの内部

Posted at 2026-05-28

正規表現は読めるけど書けない、というのが多くの人の正直なところ。\d{2,4}-\d{1,4}-\d{4} を見れば「電話番号っぽい」と分かるが、ゼロから書くときは構文をググりに行く。「クリックでトークンを置いていくと regex が組み上がる」 ビジュアルビルダーを 500 行 vanilla JS で書いた。21 種類のトークン、ライブマッチハイライト、26 件のユニットテスト。

🌐 Demo: https://sen.ltd/portfolio/regex-builder/
📦 GitHub: https://github.com/sen-ltd/regex-builder

トークンモデルがすべて

設計の核は 「regex は token の concatenation である」 という見方。各 token は次のどちらかの形を取る:

// 引数なし: そのまま emit
{ id: "digit",        pattern: "\\d" }
{ id: "any",          pattern: "." }
{ id: "q-one-or-more", pattern: "+" }

// 引数あり: ユーザ入力を value に持ち、paramFn で組み立てる
{ id: "literal",     paramFn: (v) => escapeForRegex(v) }
{ id: "char-class",  paramFn: (v) => `[${v}]` }
{ id: "q-exact",     paramFn: (v) => `{${v}}` }

これだけで全 21 トークン（文字クラス / 量指定子 / アンカー / グループ）が表現できる。compile は単に各 token の出力を join するだけ:

export function compile(tokens) {
  return tokens.map((t) => {
    const def = getTokenDef(t.id);
    if (!def) return "";
    if (def.paramFn) return def.paramFn(t.value);
    return def.pattern ?? "";
  }).join("");
}

電話番号らしき regex を組むテスト:

test("anchored email-ish pattern", () => {
  const tokens = [
    { id: "start" }, { id: "word-char" }, { id: "q-one-or-more" },
    { id: "literal", value: "@" }, { id: "word-char" }, { id: "q-one-or-more" },
    { id: "literal", value: "." }, { id: "word-char" }, { id: "q-one-or-more" },
    { id: "end" },
  ];
  assert.equal(compile(tokens), "^\\w+@\\w+\\.\\w+$");
});

@ も . も literal token を経由して 自動的に escape される から、ユーザは regex メタ文字を意識しなくていい。

リテラルの escape — メタ文字を全部叩く

const ESCAPE_RE = /[.*+?^${}()|[\]\\\/]/g;

export function escapeForRegex(s) {
  return String(s).replace(ESCAPE_RE, "\\$&");
}

.* + ? ^ $ { } ( ) | [ ] \ / を全部 escape する。/ は string literal としては不要だが、ユーザが結果を /pattern/flags 形式で貼り付けるので逃がしておく。

$& (置換マッチ全体) を \\$& に置き換えるので、3.14 → 3\.14 のように 1 文字だけ書き換わる。Backslash 自体も escape される (a\b → a\\b)。

test("escapes regex metacharacters", () => {
  assert.equal(escapeForRegex("3.14"), "3\\.14");
  assert.equal(escapeForRegex("[a]"), "\\[a\\]");
});

test("non-meta chars pass through unchanged", () => {
  assert.equal(escapeForRegex("hello world"), "hello world");
  assert.equal(escapeForRegex("日本語"), "日本語");
});

日本語は escape 不要 (regex で意味を持たない) なのでそのまま通す。

量指定子は「前を修飾する」性質

\d+ は「digit を 1 個以上」だが、token list 上は [digit, q-one-or-more] という独立した 2 要素。compile は順に concat するだけなので、位置的に並べれば自動的に正しい意味になる。

[digit, q-one-or-more]                 → \d+
[literal("#"), digit, q-one-or-more]   → #\d+
[group-open, literal("cat"), alternation, literal("dog"), group-close]
                                       → (cat|dog)

modifiesPrevious: true のフラグは持たせているが、現状の compile は順序連結だけで成立する。将来 drag-drop で「+ を \w にくっつけて移動」みたいな UX を足すときに使う。

ライブマッチハイライト — segmentText パターン

入力テキストを マッチ区間と非マッチ区間に分割 してから <mark> で囲む:

export function segmentText(text, matches) {
  const out = [];
  let cursor = 0;
  for (let i = 0; i < matches.length; i++) {
    const m = matches[i];
    if (m.start > cursor) {
      out.push({ text: text.slice(cursor, m.start), matched: false });
    }
    out.push({ text: text.slice(m.start, m.end), matched: true, matchIndex: i });
    cursor = m.end;
  }
  if (cursor < text.length) {
    out.push({ text: text.slice(cursor), matched: false });
  }
  return out;
}

UI 側はこれを HTML に変換:

$("highlighted").innerHTML = segs.map((s) =>
  s.matched
    ? `<mark>${escapeHtml(s.text)}</mark>`
    : escapeHtml(s.text)
).join("");

ハイライトロジックを pure 関数として書き出す ことで、Node テストで boundary case を全部潰せる:

test("multiple matches with gaps", () => {
  const segs = segmentText("a 1 b 2 c", [
    { start: 2, end: 3, text: "1" },
    { start: 6, end: 7, text: "2" },
  ]);
  // 期待: ["a ", "1", " b ", "2", " c"] が交互に matched=true/false
  assert.equal(segs.length, 5);
  assert.equal(segs[1].matched, true);
  assert.equal(segs[3].matched, true);
});

text.slice を start/end 境界で 5 セグメントに切る、というロジックなのでテキスト・matches オブジェクトを synthetic に渡せば検証完結。

マッチング — global vs 単発

matchAll は g フラグ必須。non-global の場合は .match() で最初のマッチだけ:

export function tryMatch(tokens, flags, text) {
  const pattern = compile(tokens);
  if (!pattern) return { ok: true, regex: null, matches: [] };

  let regex;
  try { regex = new RegExp(pattern, flags); }
  catch (e) { return { ok: false, error: e.message }; }

  const matches = [];
  if (regex.global) {
    for (const m of text.matchAll(regex)) {
      matches.push({ start: m.index, end: m.index + m[0].length, text: m[0], groups: m.slice(1) });
    }
  } else {
    const m = text.match(regex);
    if (m) matches.push({ start: m.index, end: m.index + m[0].length, text: m[0], groups: m.slice(1) });
  }
  return { ok: true, regex, matches };
}

ポイント:

不正な regex (( 未閉鎖など) は try/catch で受けて UI にエラー表示
キャプチャグループ は m.slice(1) で全部回収。group-open + group-close トークンを置けば自動的に group になる
空 tokens は ok = true で matches = [] を返す (エラーじゃない)

アーキテクチャ

tokens.js   ← トークンカタログ、compile、tryMatch、segmentText、escapeForRegex
app.js      ← UI グルー（カタログ描画、シーケンス chip、フラグ同期、ライブ更新）

依存方向は app.js → tokens.js の一方向のみ。tokens.js には document も window も登場しない。new RegExp() は browser でも Node でも同じ API なので tryMatch も Node テストで実行できる:

npm test  # 26 tests, 173ms

テスト粒度を「1 トークン = 1〜2 ケース」「複合パターン = 数ケース」「マッチング = フラグごと」「セグメント分割 = 境界条件」「カタログ整合性」で分けてある。新トークンを追加するときは tests/tokens.test.js に 1 行足してから tokens.js を更新する流れ。

まとめ

regex は 「トークンの concatenation」 という単純なモデルで実装できる
リテラル token を経由して メタ文字を自動 escape すればユーザは regex 構文を覚えなくていい
量指定子は「前を修飾する」性質だが、順序連結だけで正しく compile される
ライブマッチハイライトは segmentText で text を区間に分割 → <mark> で囲む の 2 段
try/catch new RegExp で不正パターンを UI エラーとして graceful に扱う
全ロジックを pure 関数として tokens.js に集約すると Node テスト 26 件で完全検証できる

リポジトリ: https://github.com/sen-ltd/regex-builder

このツールは弊社の OSS ポートフォリオ #245 として作成しました。SEN 合同会社（東京）では小さくて切れ味のあるツール群を継続的に公開しています: https://sen.ltd/portfolio/

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up