More than 1 year has passed since last update.

JavaScriptの正規表現について part3

Posted at 2022-09-12

初めに

前回の続きです。今回は正規表現に関連するメソッドをまとめてみました。

前回のリンクです。

今回の参考文章はこちらです。

Tools

検証ツール（パターン／ライブラリ付き）

Memo

Properties

RegExp.prototype.lastIndex：フラグg、フラグy。
- regex.lastIndexからexec()、test()など現在の検索位置を把握することができ、指定することもできる。
RegExp.prototype.dotAll：フラグs。
- .は改行文字、空白文字にマッチすることができる。
RegExp.prototype.hasIndices：フラグd。
- regex.hasIndicesによってマッチした結果の開始・終了位置の索引が見られる。
RegExp.prototype.ignoreCase：フラグi。
RegExp.prototype.global：フラグg。
RegExp.prototype.multiline：フラグm。
- ^、$は複数行に検索することができる。
RegExp.prototype.sticky：フラグy。
RegExp.prototype.unicode：フラグu。

RegExp

// Syntax
regexp = new RegExp(pattern[, flags])
// or
regexp = /pattern/flags

Methods

RegExp.prototype.exec()

指定した正規表現を使い、引数の文字列の中からマッチした結果を返す。

// Syntax
regexp.exec(str)

フラグgかフラグyの設定によって実行した前回の位置lastIndexを格納する。
フラグgはtrue（当てはまる）の部分を全面検索しfalseをジャンプするが、フラグyはfalseに遭遇すると停止しlastIndexがイニシャル状態になる。

let str = 'app le orange';

// 'g'
let regexp = /\w+/g;
console.log(regexp.lastIndex);
// 0 // initially lastIndex = 0
console.log(regexp.exec(str));
// [ 'app', index: 0, input: 'app le orange', groups: undefined ]
console.log(regexp.lastIndex);
// 3
console.log(regexp.exec(str));
// [ 'le', index: 4, input: 'app le orange', groups: undefined ]
console.log(regexp.lastIndex);
// 6

// 'y'
regexp = /\w+/y;
console.log(regexp.lastIndex);
// 0 // initially lastIndex = 0
console.log(regexp.exec(str));
// [ 'app', index: 0, input: 'app le orange', groups: undefined ]
console.log(regexp.lastIndex);
// 3
console.log(regexp.exec(str));
// null
console.log(regexp.lastIndex);
// 0

lastIndexの値を設置すればフラグgでもyでも指定のインデックスから検索する。

// lastIndex
let str = 'app le orange';
let regexp = /\w+/g;

regexp.lastIndex = 4;
console.log(regexp.exec(str));
// [ 'le', index: 4, input: 'app le orange', groups: undefined ]
console.log(regexp.lastIndex);
// 6

regexp = /\w+/y;
regexp.lastIndex = 4;
console.log(regexp.exec(str));
// [ 'le', index: 4, input: 'app le orange', groups: undefined ]
console.log(regexp.lastIndex);
// 6

regexp.lastIndex = 3;
console.log(regexp.exec(str));
// null
console.log(regexp.lastIndex);
// 0

RegExp.prototype.test()

一致したものが見つかればtrue、なければfalse。

// Syntax
regexp.test(str)

let str = 'I like HTML and JavaScript';
let regexp = /javascript/i;
console.log(regexp.test(str)); // true

デバッグやfalseになった原因を突き止めたいとき、フラグgとyを使うと便利です。
フラグgを通して文字列要素を走査すると、プロパティlastIndexがtrueの次のインデックスを示し、または終了位置falseを返してくれます。

// with 'g'
let str = 'ネコとワンちゃん';
let globalRegexp = /\p{scx=Katakana}/gu;

console.log(str.match(/\p{scx=Katakana}/gu));
// [ 'ネ', 'コ', 'ワ', 'ン' ]

console.log(globalRegexp.lastIndex); // 0 // initially lastIndex = 0
console.log(globalRegexp.test(str)); // true // index 0 is true
console.log(globalRegexp.lastIndex); // 1 // next index for test()
console.log(globalRegexp.test(str)); // true
console.log(globalRegexp.lastIndex); // 2 // index 2 is false, jump to next index 3
console.log(globalRegexp.test(str)); // true // index 3 is katakana, return true
console.log(globalRegexp.lastIndex); // 4 
console.log(globalRegexp.test(str)); // true
console.log(globalRegexp.lastIndex); // 5
console.log(globalRegexp.test(str)); // false
console.log(globalRegexp.lastIndex); // 0

フラグyならlastIndexを指定することができるが、falseになるとインデックス0に戻り、フラグgのように全体を走査することができない。

// with 'y'
let last = /\p{scx=Hiragana}/yu;
last.lastIndex = 2
console.log(last.lastIndex); // 2
console.log(last.sticky); // true

console.log(last.test(str)); // true
console.log(last.lastIndex); // 3
console.log(last.test(str)); // false
console.log(last.lastIndex); // 0

last.lastIndex = 5
console.log(last.lastIndex); // 5
console.log(last.test(str)); // true
console.log(last.lastIndex); // 6
console.log(last.test(str)); // true
console.log(last.lastIndex); // 7
console.log(last.test(str)); // true
console.log(last.lastIndex); // 8
console.log(last.test(str)); // false
console.log(last.lastIndex); // 0

Practice

特定しないメールアドレスの構造を検証するなら、　この例　のように、

let regexp = /[-.\w]+@([\w-]+\.)+[\w-]{2,20}$/g;

let str1 = '-12ab@cd-34.56-ef_';
let str2 = '.@1.ab'
// console.log(regexp.test(str1)) // true

console.log(regexp.lastIndex); // 0
console.log(regexp.test(str2)); // true
console.log(regexp.lastIndex); // 6
console.log(regexp.test(str2)); // false

何でもありそうだが、有効であるかどうかは分かりません。

練習として特定のメールアドレスバリデータを書いてみたいと。
ここは有効なGmailの検証のため、Gmailの構造を調べてみました。

こちらも参考していただきました。

長さの制限は6から30文字まで。英数文字、一部の特殊文字も含めて使える。
[a-z]、[0-9]、[\.]は使用可能。
&、=、_、'、+、-、,、<、>は使用不可。
@の前に.の使用は可能です。ただし一番前に置いたり、ダブル.を使用するのは不可です。

// /^[a-z0-9]+\.?([a-z0-9]+\.?){5,}@g(oogle)?mail\.com$/;
function validateGmail(str) {
  let examArr = str.split('@')

  if (examArr[0].length < 6 || examArr[1].length > 30) return 'Sorry, username must be between 6 and 30 characters.';

  let usernameReg = /^[a-z0-9]+\.?([a-z0-9]+\.?){5,}/;
  let gmailReg = /g(oogle)?mail\.com$/;
  if (usernameReg.test(examArr[0]) && gmailReg.test(examArr[1])) return 'Valid format';

  return 'Invalid format';
}

let email1 = '0.bcdefg@googlemail.com';
let email2 = 'a.b.c.d.e.f.g@gmail.com';
let email3 = '0bcdefg@googlemail.com';
let email4 = '0..bcdefg@gmail.com';
let email5 = 'abcdefghijklmnopqrstuvwxyz1230@gmail.com';
let email6 = '0.b..cdefg@gmail.com';

console.log(validateGmail(email1)); // Valid format
console.log(validateGmail(email2)); // Valid format
console.log(validateGmail(email3)); // Valid format
console.log(validateGmail(email4)); // Invalid format
console.log(validateGmail(email5)); // Valid format
console.log(validateGmail(email6)); // Invalid format

これはあくまでも練習のために書いてみたんですが、サインインなど有効なアカウントを作る前に一度メールアドレスを検証してから検証メールを送るほうが無難だと思います。
もし自分がメール管理会社側なら、重複しないようにデータベースとのチェックはバックエンドとの連携も必要だと思いますがここでは省けました。

下はほかの練習コード。

// check MAC-address: 6 two-digits hex numbers separated by a colon
let regexp = /^[0-9A-F]{2}(:[0-9A-F]{2}){5}$/;

console.log(regexp.test('01:32:54:67:89:AB'));

String.prototype.match()

正規表現と目標の文字列と照らし合わせて一致した結果を返す。

// Syntax
str.match(regexp)

match()の引数は正規表現オブジェクト、そうでない場合は暗黙的にRexExpのインスタンスに変換する。
フラグgついてない場合は常に最初の結果を返す、何も見つからない場合はnullを返します。

以下は練習コード。

// find css color
let regexp = /#[a-f0-9]{3}\b|#[a-f0-9]{6}\b/gi;
// let regexp = /#([a-f0-9]{3}){1,2}\b/gi;
let str = 'color: #3f3; background-color: #AA00ef; and: #abcd';
console.log(str.match(regexp));
// [ '#3f3', '#AA00ef' ]

// find all numbers
let regexp = /-?\d+(\.\d+)?/g;
let str = '-1.5 0 2 -123.4.';
console.log(str.match(regexp));
// [ '-1.5', '0', '2', '-123.4' ]

// parse an expression
// let regexp = /-?\d+(\.\d+)?\s*[\+\-\*\/]\s*-?\d+(\.\d+)?/;
// let str = '1.2 + 3.4';
// console.log(str.match(regexp));
// // [
// //   '1.2 + 3.4',
// //   '.2',
// //   '.4',
// //   index: 0,
// //   input: '1.2 + 3.4',
// //   groups: undefined
// // ]

// let regexp = /(-?\d+(?:\.\d+))?\s*([\+\-\*\/])\s*(-?\d+(?:\.\d+)?)/;
// let str = '1.2 + 3.4';
// console.log(str.match(regexp));
// // [
//   '1.2 + 3.4',
//   '1.2',
//   '+',
//   '3.4',
//   index: 0,
//   input: '1.2 + 3.4',
//   groups: undefined
// ]

function parse(expr) {
  let regexp = /(?<a>-?\d+(?:\.\d+))?\s*(?<operator>[\+\-\*\/])\s*(?<b>-?\d+(?:\.\d+)?)/;

  let result = expr.match(regexp);
  if (!result) return [];
  result.shift();

  // return [...result]
  return [result.groups.a, result.groups.operator, result.groups.b];
}

console.log(parse('-1.23 * 3.45')); // [ '-1.23', '*', '3.45' ]

String.prototype.matchAll()

フラグgが必須。返り値はイテレーターオブジェクトです。

let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g;
let str = '2019-10-30 2020-01-01';

// use for...of
let results = str.matchAll(dateRegexp);
for (let result of results) {
  let { year, month, day } = result.groups
  console.log(`${day}.${month}.${year}`);
}
// 30.10.2019
// 01.01.2020

// use Array.from()
Array.from(array-like, map())
let results = Array.from(
  str.matchAll(dateRegexp),
  (item) => {
    let { year, month, day } = item.groups;
    console.log(`${day}.${month}.${year}`);
  }
);
// 30.10.2019
// 01.01.2020

// use forEach()
let results = [...str.matchAll(dateRegexp)];
results.forEach((item) => {
  let { year, month, day } = item.groups;
  console.log(`${day}.${month}.${year}`);
})
// // 30.10.2019
// // 01.01.2020

参考文章の一番下によると、matchAll()が配列ではない反復処理可能なオブジェクトの理由を説明しました。これを見て思ったことがありますが、個人の見解や解釈も含むなので飛ばしてもいいと思います。

説明では最適化するためにイテレーターオブジェクトにしたという、一回で全部を見つけ出すより一部の結果を利用するのが負担が少ないのが最大のメリットだそうです。例えばページネーションでは一部のデータを提示したり、即時更新を実現のために非同期処理から返してくる最新結果を載せたりする、というのがイテレーターオブジェクトだからだと思います。
普通のオブジェクトは固定したメモリー位置やスペースが必要なのに対して、イテレーターオブジェクトは読み込み時に応じて書き込んだりするのが確実に軽量で動き速いと感じています。

下はその考えからのコードです。物足りなさと感じつつメモとして残したいと思います。

[Symbol.matchAll]

let str = '2019-10-30 2020-01-01 2020-02-02 2020-03-03 2020-04-04 2020-05-05 2020-06-06 2020-07-07';

let generatorObj = {
  dateRegexp: /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g,

  *[Symbol.matchAll](str) {
    for (const item of str.matchAll(this.dateRegexp)) {
      let { year, month, day } = item.groups
      yield `${day}.${month}.${year}`
    }
  }
};

function getFive(str) {
  let results = [...str.matchAll(generatorObj)];
  let arr = [];
  for (let i = 0; i < 5; i++) {
    arr.push(results[i]);
  }
  return arr;
}
console.log(getFive(str));
// [
//   '30.10.2019',
//   '01.01.2020',
//   '02.02.2020',
//   '03.03.2020',
//   '04.04.2020'
// ]

class静的メソッドのようにも書いてみました。

let dateRegexp = /(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})/g;
let str = '2019-10-30 2020-01-01 2020-02-02 2020-03-03 2020-04-04 2020-05-05';

class GetFiveDate {
  static *[Symbol.matchAll](str, regexp) {
    for (const item of str.matchAll(regexp)) {
      let { year, month, day } = item.groups
      yield `${day}.${month}.${year}`
    }
  }

  getFive(str, regexp) {
    let results = GetFiveDate[Symbol.matchAll](str, regexp)
    let arr = [];
    for (let i = 0; i < 5; i++) {
      arr.push(results.next().value)
    }
    return arr
  }
}

let test = new GetFiveDate();
console.log(test.getFive(str, dateRegexp))
// [
//   '30.10.2019',
//   '01.01.2020',
//   '02.02.2020',
//   '03.03.2020',
//   '04.04.2020'
// ]

String.prototype.search()

search()は最初の結果のインデックスを返す、何も見つからない場合は-1を返す。

// Syntax
str.search(regexp)

let str = 'A drop of ink may make a million think';
console.log(str.search(/ink/i)); // 10

[Symbol.search]

// Symbol.search
let str = '1234123456'
class MultipleSearch {
  [Symbol.search](str, target) {
    let arr = Array.from(str);
    let indices = []
    for (let i = 0; i < str.length; i++) {
      if (arr[i] === target) {
        indices.push(i)
      }
    }
    return indices;
  }
}

let test = new MultipleSearch()
console.log(test[Symbol.search](str, '1')); // [ 0, 4 ]

String.prototype.replace()

一番目の引数とマッチした文字列を、二番目に指定したように置き換える。

// Syntax
str.replace(str|regexp, str|func)

Replacement - $&

一致したものの後ろに挿入する。

console.log('I like HTML'.replace(/html/i, '$& and JavaScript'));
// I like HTML and JavaScript

Replacement - $`

一致したものを削除し挿入する。一致の前の部分があればそれを重複する。

console.log('I like HTML'.replace(/html/i, '$` and JavaScript'));
// I like I like  and JavaScript

console.log('I like HTML, React'.replace(/html/i, '$` and JavaScript'));
// I like I like  and JavaScript, React
console.log('HTML, React'.replace(/html/i, '$` and JavaScript'));
//  and JavaScript, React

Replacement - $'

一致したものを削除し挿入する。

console.log('I like HTML'.replace(/html/i, "$' and JavaScript"));
// I like  and JavaScript

Replacement - $n

キャプチャグループの順序で置き換える。

let str = 'I like HTML and JavaScript';
console.log(str.replace(/(HTML) and (JavaScript)/, '$2 and $1'));
// I like JavaScript and HTML

Replacement - `$<name>`

名前付きグループの順序で置き換える。

console.log(str.replace(/(?<element1>HTML) and (?<element2>JavaScript)/, '$<element2> and React'));
// I like JavaScript and React

function replacer()

関数でカスタマイズした内容で処理することもできる。

// Syntax
function replacer(match, p1, p2, ..., pn, offset, input, groups)
// no parentheses
function replacer(str, offset, input)
// note: iterable

// function replacer
let str = 'html and css';
let result = str.replace(/html|css/gi, (str) => str.toUpperCase());
console.log(result); // HTML and CSS

let str = 'John Snow';
let result = str.replace(/(\w+) (\w+)/, (match, name, surname) => `${surname}, ${name}`);
console.log(result) // Snow, John
// note: it just like `match()` that returns result[0], result[1], result[2]...

let str = 'John Snow';
let result = str.replace(/(?<name>\w+) (?<surname>\w+)/, (...match) => {
  let groups = match.pop(); // take the last element
  return `${groups.surname}, ${groups.name}`;
})
console.log(result) // Snow, John

String.prototype.replaceAll()

フラグgを使わなくても一番目の引数とマッチしたすべての文字列を、二番目に指定したように置き換える。

// Syntax
str.replaceAll(str|regexp, str|func)

// str.replaceAll()
console.log('12-34-56-78'.replaceAll('-', ':'));
// 12:34:56:78
console.log('12-34-56-78'.replace('-', ':'));
// 12:34-56-78
console.log('12-34-56-78'.replace(/-/g, ':'));
// 12:34:56:78

String.prototype.split()

普通の文字列か正規表現で区切りとして目標の文字列を分割し配列に返す。

// Syntax
str.split(regexp|substr, limit)

console.log('12-34-56'.split('-')) // [ '12', '34', '56' ]
console.log('12, 34, 56'.split(/,\s*/)) // [ '12', '34', '56' ]

[Symbol.split]

// Symbol.split
let str = '12-34-56-78';

class SplitBySlash {
  [Symbol.split](separator, str) {
    let arr = Array.from(str, (x) => {
      if (!x.indexOf(separator)) {
        return '\\';
      }
      return x
    })
    return arr.join('');
  }
}

let test = new SplitBySlash();
console.log(test[Symbol.split]('-', str));
// '12\34\56\78'

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up