More than 1 year has passed since last update.

JavaScriptの正規表現について part2

Last updated at 2022-09-12Posted at 2022-09-07

初めに

前回の続きです。今回は正規表現の言明（Assertions）、グループと範囲（Groups and backreferences）、数量詞（Quantifiers）、Unicode property escapesについてまとめていきたいと思います。

Memo

言明（Assertions）

^, $, x(?=y), x(?!y), (?<=y)x, (?<!y)x, \b, \B
Assertions
(?<=y)x, (?<!y)x Safariにサポートされていません。

メタ文字

^：行頭にマッチする。（改行後の行頭を含まない）
- フラグmで複数行に対して検索するようになる。
$：行末にマッチする。（改行後の行末を含まない）
- フラグmで複数行に対して検索するようになる。
- ^...$：行頭も行末も条件に当てはまるものにマッチする。
- \^ & \$：\でただの文字にする。（エスケープ）
\b：単語(\w)の先頭か末尾である場合にマッチする。
\B：単語(\w)の先頭でも末尾でもない場合にマッチする。

先読み・後読みアサーション

x(?=y)：先読みアサーション
(?<=y)x：後読みアサーション
x(?!y)：否定の先読みアサーション
(?<!y)x：否定の後読みアサーション

グループと範囲（Groups and backreferences）

(x), (?:x), (?<Name>x), x|y, [xyz], [^xyz], \Number
Groups and backreferences

いずれかの文字

[...]：いずれかの文字にマッチする。
- [...]内の^ $ . + * / [ { } ( )のメタ文字が無効になる。
- [a-z]：-で検索範囲を示す。
  - [abcx-z]：a, b, c, x, y, z
- [abc-]/[-abc]：-は普通のリテラルとして示す。
- [][ab]：], [, a, b
[^...]：[...]と真逆で、いずれでもない1文字にマッチする。
- [$^/*+.(){}[]]：[...]では()と{}は普通の文字に扱われない。()→グループ、{Number}は繰り返し。

グルーピング

(...)：いくつのグループに分けてマッチした結果を列挙する。
- (...)+(...)：直前に囲まれたものが1個以上連続する。
- (...)*(...)：直前に囲まれたものが0個以上連続する。
- (...)?(...)：直前に囲まれたものが0個または1個のみ。
- ([...]{Number})：[]の文字が{}で指定回数に出現したものにマッチする。

その他

(?:...)：指定文字がキャプチャーされない。（グルーピングだけ）

xまたはy

x|y：xかyにマッチした結果を返す。

名前付きキャプチャグループ

(?<Name>...)：名前付きのグループを作る。

キャプチャグループ

\1...\n：()で囲まれた正規表現を置き換えて表す。

数量詞（Quantifiers）

*, +, ?, x{n}, x{n,}, x{n,m}
Quantifiers

繰り返し

*：直前の文字／囲まれたものが0個以上連続する。
- .*：任意の文字が0個以上連続しているもの。
+：直前の文字／囲まれたものが1個以上連続する。
?：直前の文字／囲まれたものが0個か1個連続する。
- *?：直前が0個以上連続したうえ1番目の結果を取得。
- +?：直前が1個以上連続したうえ1番目の結果を取得。
- ??：直前が0個でも1個でも直前を無視する。
x{n}/x{n,}/x{n,m}：n-m回繰り返ししたxを返す。
- {n,m}?：n回繰り返ししたxを返す。

Unicode property escapes

\p{UnicodeProperty}, \P{UnicodeProperty}
Unicode property escapes

ユニコード

/\p{property}/u：指定されたプロパティから文字を取得する。
- /\p{UnicodeProperty}/u：Binary properties/General_Category properties
- /\p{gc=UnicodeProperty}/u：General_Category properties
- /\p{sc=ScriptProperty}/u：Script properties
- /\p{scx=ScriptProperty}/u：Script_Extensions properties
/\P{UnicodeProperty}/u：指定されたプロパティ以外の文字を取得する。

Unicode Utilities: Character Properties
Unicode Utilities: Character Property Index

エスケープシーケンス

\^：ハット( ^ )
\$：ドル記号( $ )
\.：ピリオド( . )
\+：プラス( + )
\*：アスタリスク( * )
/：スラッシュ( / )
\\：ブックスラッシュ( \ )
\[：大括弧( [ )
\]：大括弧( ] )
\{：中括弧( { )
\}：中括弧( { )
\(：丸括弧( ( )
\)：丸括弧( ) )

Assertions

Anchors: string start `^`

^：行頭にマッチする。（改行後の行頭を含まない）
フラグmで複数行に対して検索するようになる。

// `^` at starting position
console.log('ABCDEF'.match(/^ABC/));
// [ 'ABC', index: 0, input: 'ABCDEF', groups: undefined ]
console.log('ABCDEF'.match(/^DEF/));
// null
console.log('ABC\nDEF'.match(/^DEF/));
// null
console.log('ABC\nDEF'.match(/^DEF/m));
// [ 'DEF', index: 4, input: 'ABC\nDEF', groups: undefined ]
// m: multiline

Anchors: string end `$`

$：行末にマッチする。（改行後の行末を含まない）
フラグmで複数行に対して検索するようになる。

// `$` at ending position
console.log('ABCDEF'.match(/DEF$/));
// ['DEF', index: 3, input: 'ABCDEF', groups: undefined]
console.log('ABCDEF\nXYZ'.match(/DEF$/));
// null
console.log('ABCDEF\nXYZ'.match(/DEF$/m));
// [ 'DEF', index: 3, input: 'ABCDEF\nXYZ', groups: undefined ]
// m: multiline

Full match: `^...$`

^...$：行頭も行末も条件に当てはまるものにマッチする。

console.log('abcde'.match(/^a...e$/));
// [ 'abcde', index: 0, input: 'abcde', groups: undefined ]
console.log('abc\ncd'.match(/^a..d$/m));
// null
console.log('abc\ncd\nabcd'.match(/^a..d$/m));
// [ 'abcd', index: 7, input: 'abc\ncd\nabcd', groups: undefined ]
console.log('abc\nd'.match(/^abc$/m));
// [ 'abc', index: 0, input: 'abc\nd', groups: undefined ]

Escaping: `\^` & `\$`

\^ & \$：\でただの文字にする。（エスケープ）

console.log('^abc'.match(/\^a/));
// [ '^a', index: 0, input: '^abc', groups: undefined ]
console.log('^abc$'.match(/\^abc\$/));
// [ '^abc$', index: 0, input: '^abc$', groups: undefined ]

^^と$$はJavaScriptで使えません。

// console.log('^abc'.match(/^^a/));
// // null
// console.log('abc$'.match(/a$$/));
// // null

Word boundary: `\b`

\b：単語(\w)の先頭か末尾である場合にマッチする。

// `\b` word bondaries
let m = 'I am trying to figure out how to write regexp.\n And how it works in JavaScript.';
console.log(m.match(/\bre/));
// [
//   're',
//   index: 39,
//   input: 'I am trying to figure out how to write regexp.\n' +
//     ' And how it works in JavaScript.',
//   groups: undefined
// ]
// note: regexp
console.log(m.match(/re\b/));
// [
//   're',
//   index: 19,
//   input: 'I am trying to figure out how to write regexp.\n' +
//   ' And how it works in JavaScript.',
//   groups: undefined
// ]
// note: figure

m = 'I like JavaScript. I want to learn more about programming.';
console.log(m.match(/\bJavaScript\b/));
// [
//   'JavaScript',
//   index: 7,
//   input: 'I like JavaScript. I want to learn more about programming.',
//   groups: undefined
// ]

Word boundary: `\B`

\B：単語の先頭でも末尾でもない場合にマッチする。

// `\B` non-word bondaries
console.log(m.match(/\Bva/));
// [
//   'va',
//   index: 70,
//   input: 'I am trying to figure out how to write regexp.\n' +
//     ' And how it works in JavaScript.',
//   groups: undefined
// ]
// note: JavaScript

Lookahead: `x(?=y)`

x(?=y)：先読みアサーション
x(?=y)：xの後はyが続いているでしょうか？trueならxを返す。
特定の文字列の前にある文字列を取得する。

// `?=` lookahead assertion
console.log('abcdef'.match(/(abc)(?=def)/));
// [ 'abc', 'abc', index: 0, input: 'abcdef', groups: undefined ]
console.log('abcdef'.match(/abc(?=def)/));
// [ 'abc', index: 0, input: 'abcdef', groups: undefined ]
console.log('abcdef'.match(/c(?=def)/));
// [ 'c', index: 2, input: 'abcdef', groups: undefined ]

()は当てはまる文字列自体も返すので、/(abc)(?=def)/の結果が二つになるわけです。

console.log('abcdef'.match(/(abc)(def)/));
// [
//   'abcdef',
//   'abc',
//   'def',
//   index: 0,
//   input: 'abcdef',
//   groups: undefined
// ]

Lookbehind: `(?<=y)x`

(?<=y)x：後読みアサーション
(?<=y)x：xの前はyが存在しているでしょうか？trueならxを返す。
特定の文字列の後にある文字列を取得する。

// `?<=` lookbehind assertion
console.log('abcdef'.match(/(?<=abc)(def)/));
// [ 'def', 'def', index: 3, input: 'abcdef', groups: undefined ]

Negative lookahead: `x(?!y)`

x(?!y)：否定の先読みアサーション
x(?!y)：xの後はyが続いているでしょうか？falseならxを返す。
ある文字の後に特定の文字列を排除する。

// `?!` negative lookahead assertion
console.log('abcxyz'.match(/(abc)(?!def)/));
// [ 'abc', 'abc', index: 0, input: 'abcxyz', groups: undefined ]

Safariにサポートされていません。

Negative lookbehind: `(?<!y)x`

(?<!y)x：否定の後読みアサーション
(?<!y)x：xの前はyが存在しているでしょうか？falseならxを返す。
ある文字の前に特定の文字列を排除する。

// `?<!` negative lookbehind assertion
console.log('xyzdef'.match(/(?<!abc)(def)/));
// [ 'def', 'def', index: 3, input: 'xyzdef', groups: undefined ]

Safariにサポートされていません。

Practices

// x(?!y)
let str = '2 turkeys cost 60€';
let regexp = /\d+(?!€)/;
console.log(str.match(regexp));
// [ '2', index: 0, input: '2 turkeys cost 60€', groups: undefined ]

// (?<=y)x
let str = '1 turkey costs $30';
let regexp = /(?<=\$)\d+/;
console.log(str.match(regexp));
// [ '30', index: 16, input: '1 turkey costs $30', groups: undefined ]

// (?<!y)x
let str = '2 turkeys cost $60';
let regexp = /(?<!$)\d+/;
console.log(str.match(regexp));
// [ '2', index: 0, input: '2 turkeys cost $60', groups: undefined ]

// (?=(...))
let str = '1 turkey costs 30€';
let regexp = /\d+(?=(€|kr))/;
console.log(str.match(regexp));

// (?<=(...|...))
let str = '1 turkey costs $30';
let regexp = /(?<=(\$|£))\d+/;
console.log(str.match(regexp));

// find non-negative integers
let str = '0 12 -5 123 -18 -1000';
let regexp = /(?<![-\d])\d+/g;
console.log(str.match(regexp)) // [ '0', '12', '123' ]

Groups and backreferences

Set and ranges: `[...]`

[...]：いずれかの文字にマッチする。

// `[...]` bracket expression
console.log('abcd-abcd'.match(/[a-c]/g));
// ['a', 'b', 'c', 'a', 'b', 'c']

console.log('abcd-abcd'.match(/[abc-]/g));
// [
//   'a', 'b', 'c',
//   '-', 'a', 'b',
//   'c'
// ]
console.log('dd-dd'.match(/[-abc]/));
// [ '-', index: 2, input: 'dd-dd', groups: undefined ]

console.log('abcde'.match(/[^abc]/));
// [ 'd', index: 3, input: 'abcde', groups: undefined ]
console.log('abcdef'.match(/[^abc]*/));
// [ '', index: 0, input: 'abcdef', groups: undefined ]
console.log('abcdef'.match(/[^abc]+/));
// [ 'def', index: 3, input: 'abcdef', groups: undefined ]
console.log('abcde\nf'.match(/[^abc]+/));
// [ 'de\nf', index: 3, input: 'abcde\nf', groups: undefined ]

Excluding ranges: `[^...]`

[^...]：[...]と真逆で、いずれでもない1文字にマッチする。

console.log('abcdef123\na123'.match(/a[^a-z].*/));
// [ 'a123', index: 10, input: 'abcdef123\na123', groups: undefined ]
console.log('abc012[]^$.'.match(/[^a-z0-9](\.*)/g));
// [ '[', ']', '^', '$.' ]
console.log('abc012[]^$.'.match(/[^a-z0-9]\.*/g));
// [ '[', ']', '^', '$.' ]

console.log('<ABC><DEF>'.match(/(<[^>]*>)/));
// [ '<ABC>', '<ABC>', index: 0, input: '<ABC><DEF>', groups: undefined ]
console.log('>ABC><DEF>'.match(/(<[^>]*>)/));
// [ '<DEF>', '<DEF>', index: 5, input: '>ABC><DEF>', groups: undefined ]
console.log('<<ABC>><DEF>'.match(/(<[^>]*>)/));
// [
//   '<<ABC>',
//   '<<ABC>',
//   index: 0,
//   input: '<<ABC>><DEF>',
//   groups: undefined
// ]

`[$^/*+.(){}[]]`

console.log('$^/*+.{}()[]'.match(/[.+*/$^]/g));
// [ '$', '^', '/', '*', '+', '.' ]
console.log('$^/*+.{}()[]'.match(/[(){}[]]/g));
// [ '[]' ]
console.log('()'.match(/[(){}[]]/));
// null
console.log('{}'.match(/[(){}[]]/));
// null
console.log('abcdef'.match(/[(abc){1}]/));
// [ 'a', index: 0, input: 'abcdef', groups: undefined ]

console.log('abcdef'.match(/(abc){1}/));
// [ 'abc', 'abc', index: 0, input: 'abcdef', groups: undefined ]

Capturing groups: `(...)`

(...)：いくつのグループに分けてマッチした結果を列挙する。

// `(...)` marked pattern (capturing group)
console.log('abcdef'.match(/(abc)(def)/));
// [
//   'abcdef',
//   'abc',
//   'def',
//   index: 0,
//   input: 'abcdef',
//   groups: undefined
// ]

`(...)+(...)`

(...)+(...)：直前に囲まれたものが1個以上連続する。
+：直前の文字か囲まれた文字が必ず1回出現する。

console.log('abcabcabcdef'.match(/(abc)+(def)/));
// [
//   'abcabcabcdef',
//   'abc',
//   'def',
//   index: 0,
//   input: 'abcabcabcdef',
//   groups: undefined
// ]

`(...)*(...)`

(...)*(...)：直前に囲まれたものが0個以上連続する。
*：直前の文字か囲まれた文字が出現したかと関係なく。

console.log('abcabcabcdef'.match(/(abc)*(def)/));
// [
//   'abcabcabcdef',
//   'abc',
//   'def',
//   index: 0,
//   input: 'abcabcabcdef',
//   groups: undefined
// ]

console.log('xyzxyzxyzdef'.match(/(abc)*(def)/));
// [
//   'def',
//   undefined,
//   'def',
//   index: 9,
//   input: 'xyzxyzxyzdef',
//   groups: undefined
// ]

Optional groups: `(...)?(...)`

(...)?(...)：直前に囲まれたものが0個または1個のみ。
?：直前の文字か囲まれた文字が一回出現した、または出現しなかった。

console.log('abcabcabcdef'.match(/(abc)?(def)/));
// [
//   'abcdef',
//   'abc',
//   'def',
//   index: 6,
//   input: 'abcabcabcdef',
//   groups: undefined
// ]

console.log('xyzxyzxyzdef'.match(/(abc)?(def)/));
// [
//   'def',
//   undefined,
//   'def',
//   index: 9,
//   input: 'xyzxyzxyzdef',
//   groups: undefined
// ]

`([...]{Number})`

([...]{Number})：[]の文字が{}で指定回数に出現したものにマッチする。

console.log('abcabc'.match(/(abc)/));
// [ 'abc', 'abc', index: 0, input: 'abcabc', groups: undefined ]
console.log('abcabc'.match(/(abc)/)[1]);
// abc
let yymd = '2022.09.05'.match(/([0-9]{4}).([0-9]{2}).([0-9]{2})/);
console.log(yymd);
// [
//   '2022.09.05',
//   '2022',
//   '09',
//   '05',
//   index: 0,
//   input: '2022.09.05',
//   groups: undefined
// ]
console.log(yymd[1], yymd[2], yymd[3]);
// 2022 09 05

Non-capturing groups: `(?:...)`

(?:...)：指定文字がキャプチャーされない。（グルーピングだけ）

console.log('abcdefxyz'.match(/(abc)(?:def)(xyz)/));
// [
//   'abcdefxyz',
//   'abc',
//   'xyz',
//   index: 0,
//   input: 'abcdefxyz',
//   groups: undefined
// ]
console.log('abcdefxyz'.match(/(?:abc)(def)(xyz)/));
// [
//   'abcdefxyz',
//   'def',
//   'xyz',
//   index: 0,
//   input: 'abcdefxyz',
//   groups: undefined
// ]

console.log('abcdefxyz'.match(/abc(?:def)xyz/));
// [ 'abcdefxyz', index: 0, input: 'abcdefxyz', groups: undefined ]
console.log('abcdefxyz'.match(/(?:abc)defxyz/));
// [ 'abcdefxyz', index: 0, input: 'abcdefxyz', groups: undefined ]
console.log('abcdefxyz'.match(/(?:abc)(def)xyz/));
// [ 'abcdefxyz', 'def', index: 0, input: 'abcdefxyz', groups: undefined ]

Alternation (OR): `x|y`

x|y：xかyにマッチした結果を返す。

console.log('abbdef'.match(/abc|def/));
// [ 'def', index: 3, input: 'abbdef', groups: undefined ]
console.log('abcdef'.match(/(abc)|(d)/));
// [
//   'abc',
//   'abc',
//   undefined,
//   index: 0,
//   input: 'abcdef',
//   groups: undefined
// ]
console.log('abbdef'.match(/(abc)|(d)/));
// [ 'd', undefined, 'd', index: 3, input: 'abbdef', groups: undefined ]

Named groups: `(?<Name>...)`

(?<Name>...)：名前付きのグループを作る。

console.log('abcdefxyz'.match(/(?<foo>abc)/));
// [
//   'abc',
//   'abc',
//   index: 0,
//   input: 'abcdefxyz',
//   groups: [Object: null prototype] { foo: 'abc' }
// ]

console.log('abcdefxyz'.match(/(?<foo1>abc)+(?<foo2>def)/));
// [
//   'abcdef',
//   'abc',
//   'def',
//   index: 0,
//   input: 'abcdefxyz',
//   groups: [Object: null prototype] { foo1: 'abc', foo2: 'def' }
// ]

Backreference by number: `\n`

\n：()で囲まれた正規表現を置き換えて表す。
/(A)(B)\2/：\2では2番目()の内容を入れて検索する。⇒/(A)(B)(B)/
/(A)(B)\1/：\1では1番目()の内容を入れて検索する。⇒/(A)(B)(A)/

console.log("ABB".match(/(A)(B)\2/));
// [ 'ABB', 'A', 'B', index: 0, input: 'ABB', groups: undefined ]
console.log("ABA".match(/(A)(B)\1/));
// [ 'ABA', 'A', 'B', index: 0, input: 'ABA', groups: undefined ]
console.log("ABABCC".match(/(A)(B)\1\2/));
// [ 'ABAB', 'A', 'B', index: 0, input: 'ABABCC', groups: undefined ]

Backreference by name: `\k<name>`

let str = 'He said: \"She\'s the one!\".';
let regexp = /(?<quote>['"])(.*?)\k<quote>/g;
console.log(str.match(regexp)); // [ `"She's the one!"` ]

Practices

let str = 'He said: \"She\'s the one!\".';
// let regexp = /(['"](.*?)["])/g;
// console.log(str.match(regexp)); // [ `"She's the one!"` ]

// \n
// let regexp = /(['"])(.*?)\1/g; // it is the same with /(["])(.*?)(["])/g
// console.log(str.match(regexp)); // [ `"She's the one!"` ]

// x|y
// regexp for time: hh:mm
let str = '00:00 10:10 23:59 25:59 1:2'
let regexp = /([01]\d|2[0-3]):[0-5]\d/g;
console.log(str.match(regexp)); // [ '00:00', '10:10', '23:59' ]

// find bb-tag pairs
let str = `
  [b]hello![/b]
  [quote]
    [url]http://google.com[/url]
  [/quote]
  [url]
    [b]http://google.com[/b]
  [/url]
`;
let regexp = /\[(b|url|quote)].*?\[\/\1]/gs;
console.log(str.match(regexp));
// [
//   '[b]hello![/b]',
//   '[quote]\n    [url]http://google.com[/url]\n  [/quote]',
//   '[url]\n    [b]http://google.com[/b]\n  [/url]'
// ]
// note: \[], /s

// find quoted strings
let str = ' .. "test me" .. "Say \\"Hello\\"!" .. "\\\\ \\"" .. ';
let regexp = /"(\\.|[^"\\])*"/g;
console.log(str.match(regexp));
// [ '"test me"', '"Say \\"Hello\\"!"', '"\\\\ \\""' ]

Quantifiers

Zero or moer: `*`

*：直前の文字／囲まれたものが0個以上連続する。

// `*` matches the preceding element zero or more times
console.log('ac'.match(/ab*c/));
// [ 'ac', index: 0, input: 'ac', groups: undefined ]
console.log('abc'.match(/ab*c/));
// [ 'abc', index: 0, input: 'abc', groups: undefined ]
console.log('abbbc'.match(/ab*c/));
// [ 'abbbc', index: 0, input: 'abbbc', groups: undefined ]

console.log('x'.match(/[xyz]*/)); // [ 'x', ... ]
console.log('y'.match(/[xyz]*/)); // [ 'y', ... ]
console.log('z'.match(/[xyz]*/)); // [ 'z', ... ]
console.log('zx'.match(/[xyz]*/)); // [ 'zx', ... ]
console.log('zyx'.match(/[xyz]*/)); // [ 'zyx', ... ]
console.log('xyzzy'.match(/[xyz]*/)); // [ 'xyzzy', ... ]

console.log('axcdyfgzi'.match(/[xyz]*/));
// [ '', index: 0, input: 'axcdyfgzi', groups: undefined ]
console.log('axyz'.match(/[xyz]*/));
// [ '', index: 0, input: 'axyz', groups: undefined ]
console.log('xyza'.match(/[xyz]*/));
// [ 'xyz', index: 0, input: 'xyza', groups: undefined ]
console.log('a\nxyz'.match(/[xyz]*/));
// [ '', index: 0, input: 'a\nxyz', groups: undefined ]
// note: [...]* means it matches every element in starting position, or just ''

Greedy search: `.*`

.*：任意の文字が0個以上連続しているもの。

console.log('abcdef'.match(/b.*/));
// [ 'bcdef', index: 1, input: 'abcdef', groups: undefined ]

One or more: `+`

+：直前の文字／囲まれたものが1個以上連続する。

// `+` matches the preceding element one or more times
console.log('ac'.match(/ab+c/));
// null
console.log('abc'.match(/ab+c/));
// [ 'abc', index: 0, input: 'abc', groups: undefined ]
console.log('abbbc'.match(/ab+c/));
// [ 'abbbc', index: 0, input: 'abbbc', groups: undefined ]

console.log('x'.match(/[xyz]+/)); // [ 'x', ... ]
console.log('y'.match(/[xyz]+/)); // [ 'y', ... ]
console.log('z'.match(/[xyz]+/)); // [ 'z', ... ]
console.log('zx'.match(/[xyz]+/)); // [ 'zx', ... ]
console.log('zyx'.match(/[xyz]+/)); // [ 'zyx', ... ]
console.log('xyzzy'.match(/[xyz]+/)); // [ 'xyzzy', ... ]

console.log('axcdyfgzi'.match(/[xyz]+/));
// [ 'x', index: 1, input: 'axcdyfgzi', groups: undefined ]
console.log('axyz'.match(/[xyz]+/));
// [ 'xyz', index: 1, input: 'axyz', groups: undefined ]
console.log('xyza'.match(/[xyz]+/));
// [ 'xyz', index: 0, input: 'xyza', groups: undefined ]
console.log('a\nxyz'.match(/[xyz]+/));
// [ 'xyz', index: 2, input: 'a\nxyz', groups: undefined ]

Greedy search: `.+`

console.log('a "witch" and her "broom" is one'.match(/".+"/g));
// [ '"witch" and her "broom"' ]
console.log('a "witch" and her "broom" is one'.match(/"[^"]+"/g));
// [ '"witch"', '"broom"' ]

Zero or one: `?`

?：直前の文字／囲まれたものが0個か1個連続する。

// `?` matches the preceding element zero or one time
console.log('ac'.match(/ab?c/));
// [ 'ac', index: 0, input: 'ac', groups: undefined ]
console.log('abc'.match(/ab?c/));
// [ 'abc', index: 0, input: 'abc', groups: undefined ]
console.log('apples'.match(/apples?/));
// [ 'apples', index: 0, input: 'apples', groups: undefined ]
console.log('apple'.match(/apples?/));
// [ 'apple', index: 0, input: 'apple', groups: undefined ]
console.log('apple\napples'.match(/apples?/g));
// [ 'apple', 'apples' ]

console.log('123'.match(/-?[0-9]/));
// [ '1', index: 0, input: '123', groups: undefined ]
console.log('-123'.match(/-?[0-9]/));
// [ '-1', index: 0, input: '-123', groups: undefined ]
console.log('123'.match(/-?[0-9]+/));
// [ '123', index: 0, input: '123', groups: undefined ]
console.log('-123'.match(/-?[0-9]+/));
// [ '-123', index: 0, input: '-123', groups: undefined ]

console.log('-123-456'.match(/-?(123)+(-456)/));
// [
//   '-123-456',
//   '123',
//   '-456',
//   index: 0,
//   input: '-123-456',
//   groups: undefined
// ]

Lazy mode: `*?`

*?：直前が0個以上連続したうえ1番目の結果を取得。

console.log('<abc><def>'.match(/<.*>/));
// [ '<abc><def>', index: 0, input: '<abc><def>', groups: undefined ]
console.log('<abc><def>'.match(/<.*?>/));
// [ '<abc>', index: 0, input: '<abc><def>', groups: undefined ]
console.log('<<><>>'.match(/<.*>/));
// [ '<<><>>', index: 0, input: '<<><>>', groups: undefined ]
console.log('<<><>>'.match(/<.*?>/));
// [ '<<>', index: 0, input: '<<><>>', groups: undefined ]
console.log('<>'.match(/<.*?>/));
// [ '<>', index: 0, input: '<>', groups: undefined ]

Lazy mode: `+?`

+?：直前が1個以上連続したうえ1番目の結果を取得。

console.log('<abc><def>'.match(/<.+>/));
// [ '<abc><def>', index: 0, input: '<abc><def>', groups: undefined ]
console.log('<abc><def>'.match(/<.+?>/));
// [ '<abc>', index: 0, input: '<abc><def>', groups: undefined ]
console.log('<<><>>'.match(/<.+>/));
// [ '<<><>>', index: 0, input: '<<><>>', groups: undefined ]
console.log('<<><>>'.match(/<.+?>/));
// [ '<<>', index: 0, input: '<<><>>', groups: undefined ]
console.log('<>'.match(/<.+?>/));
// null

Lazy mode: `??`

??：直前が0個でも1個でも直前を無視する。

console.log('<abc><def>'.match(/ab?/));
// [ 'ab', index: 1, input: '<abc><def>', groups: undefined ]
console.log('<abc><def>'.match(/ab??/));
// [ 'a', index: 1, input: '<abc><def>', groups: undefined ]
console.log('<ac><def>'.match(/ab??/));
// [ 'a', index: 1, input: '<ac><def>', groups: undefined ]

Quantity: `x{n}`/`x{n,}`/`x{n,m}`

x{n}/x{n,}/x{n,m}：n-m回繰り返ししたxを返す。

// `x{n}`/`x{n,}`/`x{n,m}` matches the preceding element n~m times
console.log('aaaaaa'.match(/a{3}/));
// [ 'aaa', index: 0, input: 'aaaaaa', groups: undefined ]
console.log('aaaaaa'.match(/a{3,5}/));
// [ 'aaaaa', index: 0, input: 'aaaaaa', groups: undefined ]
console.log('aaaaaa'.match(/a{3,}/));
// [ 'aaaaaa', index: 0, input: 'aaaaaa', groups: undefined ]

console.log('202209'.match(/[0-9]{4}/));
// [ '2022', index: 0, input: '202209', groups: undefined ]
console.log('2022\n09'.match(/[0-9]{5}/));
// null

Lazy mode: `x{n,m}?`

x{n,m}?：n回繰り返ししたxを返す。

console.log('aaaaaa'.match(/a{3,5}/));
// [ 'aaaaa', index: 0, input: 'aaaaaa', groups: undefined ]
console.log('aaaaaa'.match(/a{3,5}?/));
// [ 'aaa', index: 0, input: 'aaaaaa', groups: undefined ]

Unicode property escapes

Unicode: `/\p{UnicodeProperty}/u`

/\p{UnicodeProperty}/u：指定されたプロパティから文字を取得する。
{}で指定するプロパティを入れると特定の文字を取得する。
しかし普通のプロパティだけでは漢字、ひらがな、カタカナを各々マッチングするのが難しいです。

// L(Letter)
console.log('aA12あぁアァ一二三'.match(/\p{L}/gu));
// [
//   'a', 'A', 'あ',
//   'ぁ', 'ア', 'ァ',
//   '一', '二', '三'
// ]
console.log('aA12あぁアァ一二三'.match(/\p{Ll}/gu));
// [ 'a' ]
console.log('aA12あぁアァ一二三'.match(/\p{Lm}/gu));
// null
console.log('aA12あぁアァ一二三'.match(/\p{Lo}/gu));
// [
//   'あ', 'ぁ',
//   'ア', 'ァ',
//   '一', '二',
//   '三'
// ]
console.log('aA12あぁアァ一二三'.match(/\p{Lt}/gu));
// null
console.log('aA12あぁアァ一二三'.match(/\p{Lu}/gu));
// [ 'A' ]

// N(Number)
console.log('aA12あぁアァ一二三'.match(/\p{N}/gu));
// [ '1', '2' ]
console.log('aA12あぁアァ一二三'.match(/\p{Nd}/gu));
// [ '1', '2' ]
console.log('aA12あぁアァ一二三'.match(/\p{Nl}/gu));
// null
console.log('aA12あぁアァ一二三'.match(/\p{No}/gu));
// null

普通のプロパティではなくスクリプトプロパティなら特定体系の文字を取得することができる。

// \p{Script=ScriptProperty} \p{Script_Extensions=ScriptProperty}
console.log('aA12あぁアァ一二三'.match(/\p{Script=Han}/gu));
// [ '一', '二', '三' ]
console.log('aA12あぁアァ一二三'.match(/\p{Script=Hiragana}/gu));
// [ 'あ', 'ぁ' ]
console.log('aA12あぁアァ一二三'.match(/\p{Script=Katakana}/gu));
// [ 'ア', 'ァ' ]

console.log('ラッキー'.match(/\p{Script_Extensions=Katakana}/gu));
// // [ 'ラ', 'ッ', 'キ', 'ー' ]
console.log('カービィ'.match(/\p{Script_Extensions=Katakana}/gu));
// // [ 'カ', 'ー', 'ビ', 'ィ' ]
console.log('゛゜'.match(/\p{Script_Extensions=Katakana}/gu));

（カタカナ「ー、゛、゜」は/\p{Script_Extensions=Katakana}/）では返してくれます。

略した書き方、そしてES9からさらにプロパティの表記も読みやすくなりました。

// Script => sc, Script_Extensions => scx
console.log('Hello, 你好，こんにちは！'.match(/\p{sc=Han}/ug));
// [ '你', '好' ]
console.log('Hello, 你好，こんにちは！'.match(/\p{scx=Han}/ug));
// [ '你', '好' ]

console.log('あぁアァ゛゜ー'.match(/\p{sc=Katakana}/ug));
// [ 'ア', 'ァ' ]
console.log('あぁアァ゛゜ー'.match(/\p{scx=Katakana}/ug));
// [ 'ア', 'ァ', '゛', '゜', 'ー' ]

// ES9
console.log('Hello, 你好，こんにちは！'.match(/\p{General_Category=Letter}/ug));
console.log('Hello, 你好，こんにちは！'.match(/\p{gc=Letter}/ug));
console.log('Hello, 你好，こんにちは！'.match(/\p{Letter}/ug));
// [
//   'H', 'e', 'l', 'l',
//   'o', '你', '好', 'こ',
//   'ん', 'に', 'ち', 'は'
// ]

// Unicode: \p{General_Category}
console.log(/\p{Other_Symbol}/u.test('☕')); // true
console.log('0xAF'.match(/0x\p{Hex_Digit}\p{Hex_Digit}/u));
// [ '0xAF', index: 0, input: '0xAF', groups: undefined ]

プロパティの詳細はこちらです。

下はMDNの例です。絵文字もあるので一緒にメモしようと思います。

console.log('A ticket to 大阪 costs ¥2000 👌.'.match(/\p{Emoji_Presentation}/u))
// [
//   '�',
//   index: 27,
//     input: 'A ticket to 大阪 costs ¥2000 �.',
//       groups: undefined
// ]

Unicode: `/\P{UnicodeProperty}/u`

/\P{UnicodeProperty}/u：指定されたプロパティ以外の文字を取得する。

// \P{}
console.log('aA12あぁアァ一二三'.match(/\P{Script_Extensions=Katakana}/gu));
// [
//   'a', 'A', '1',
//   '2', 'あ', 'ぁ',
//   '一', '二', '三'
// ]

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up

JavaScriptの正規表現について part2

初めに

Memo

メタ文字

先読み・後読みアサーション

いずれかの文字

グルーピング

その他

xまたはy

名前付きキャプチャグループ

キャプチャグループ

繰り返し

ユニコード

Assertions

Anchors: string start ^

Anchors: string end $

Full match: ^...$

Escaping: \^ & \$

Word boundary: \b

Word boundary: \B

Lookahead: x(?=y)

Lookbehind: (?<=y)x

Negative lookahead: x(?!y)

Negative lookbehind: (?<!y)x

Practices

Groups and backreferences

Set and ranges: [...]

Excluding ranges: [^...]

[$^/*+.(){}[]]

Capturing groups: (...)

(...)+(...)

(...)*(...)

Optional groups: (...)?(...)

([...]{Number})

Non-capturing groups: (?:...)

Alternation (OR): x|y

Named groups: (?<Name>...)

Backreference by number: \n

Backreference by name: \k<name>

Practices

Quantifiers

Zero or moer: *

Greedy search: .*

One or more: +

Greedy search: .+

Zero or one: ?

Lazy mode: *?

Lazy mode: +?

Lazy mode: ??

Quantity: x{n}/x{n,}/x{n,m}

Lazy mode: x{n,m}?

Unicode property escapes

Unicode: /\p{UnicodeProperty}/u

Unicode: /\P{UnicodeProperty}/u

Anchors: string start `^`

Anchors: string end `$`

Full match: `^...$`

Escaping: `\^` & `\$`

Word boundary: `\b`

Word boundary: `\B`

Lookahead: `x(?=y)`

Lookbehind: `(?<=y)x`

Negative lookahead: `x(?!y)`

Negative lookbehind: `(?<!y)x`

Set and ranges: `[...]`

Excluding ranges: `[^...]`

`[$^/*+.(){}[]]`

Capturing groups: `(...)`

`(...)+(...)`

`(...)*(...)`

Optional groups: `(...)?(...)`

`([...]{Number})`

Non-capturing groups: `(?:...)`

Alternation (OR): `x|y`

Named groups: `(?<Name>...)`

Backreference by number: `\n`

Backreference by name: `\k<name>`

Zero or moer: `*`

Greedy search: `.*`

One or more: `+`

Greedy search: `.+`

Zero or one: `?`

Lazy mode: `*?`

Lazy mode: `+?`

Lazy mode: `??`

Quantity: `x{n}`/`x{n,}`/`x{n,m}`

Lazy mode: `x{n,m}?`

Unicode: `/\p{UnicodeProperty}/u`

Unicode: `/\P{UnicodeProperty}/u`