More than 1 year has passed since last update.

PHPで時々使う正規表現

Last updated at 2023-08-12Posted at 2023-08-12

正規表現とは

正規表現が何なのかについては、Wikipediaの項目を見てください。

PHPにおける正規表現

PHPには正規表現を扱うpreg_* 系の関数があります。

一番よく使うのは preg_match だと思います。

preg_filter
preg_grep
preg_last_error_msg
preg_last_error
preg_match_all
preg_match ← よく使う
preg_quote
preg_replace_callback_array
preg_replace_callback
preg_replace
preg_split

URLが `https://` 開始かどうか調べる

if (preg_match('|https://.*|', $url)) {
    // https:// 開始だったときの処理
}

https:// の右側の値を使いたい場合は以下のようにします。

if (preg_match('|https://(.*)|', $url, $m)) {
    // https:// 開始だったときの処理
}

https:// の右側が要らない場合は、PHP8であれば preg_match を使わず str_starts_with を使えます。

if (str_starts_with($url, 'https://')) {
    // https:// 開始だったときの処理
}

メールアドレスのドメイン部分が `gmail.com` かどうか調べる。

if (preg_match('/.*@gmail.com$/', $mailAddress)) {
    // Gmailの場合の処理
}

@ の左側の値を使いたい場合は以下のようにします。

if (preg_match('/(.*)@gmail.com$/', $mailAddress, $m)) {
	// Gmailの場合の処理
}

@ の左側が要らない場合は、PHP8であれば preg_match を使わず str_ends_with を使えます。

if (str_ends_with($mailAddress, '@gmail.com')) {
    // Gmailの場合の処理
}

ユーザーエージェントが `Android` を含むかどうか調べる

if (preg_match('/Android/', $userAgent)) {
	// Androidの場合の処理
}

これはstrposを使ったほうがいいかもしれません。

if (strpos($userAgent, 'Android') !== false) {
	// Androidの場合の処理
}

PHP8以降であれば str_contains が使えます。

if (str_contains($userAgent, 'Android')) {
	// Androidの場合の処理
}

パスワードとしてふさわしくない文字列かどうか調べる

複数の値について完全一致でチェックしたい場合も正規表現で書くことがある。

const BAD_PASSWORD_PATTERN = '/^(123456|123456789|Qwerty|Password|12345|12345678|111111|1234567|123123|Qwerty123|1q2w3e|1234567890|DEFAULT|0|Abc123|654321|123321|Qwertyuiop|Iloveyou|666666)$/';

if (preg_match(BAD_PASSWORD_PATTERN, $password)) {
	// パスワードとしてふさわしくない場合
}

チェックしたい項目が膨大な場合はin_arrayやarray_key_existsで書くこともあります。
PCREでは正規表現の長さに制限があるので、正規表現が長すぎると実行時例外になることがあります。

const BAD_PASSWORDS = [
	'123456',
	'123456789',
	'Qwerty',
	'Password',
	'12345',
	'12345678',
	'111111',
	'1234567',
	'123123',
	'Qwerty123',
	'1q2w3e',
	'1234567890',
	'DEFAULT',
	'0',
	'Abc123',
	'654321',
	'123321',
	'Qwertyuiop',
	'Iloveyou',
	'666666'
];
	
if (in_array($password, BAD_PASSWORDS)) {
	// パスワードとしてふさわしくない場合
}

文字列をコードポイント毎に分割する

文字列をUnicodeのコードポイント毎に分割します。

$splitted = preg_split('//u', $text, -1, PREG_SPLIT_NO_EMPTY);

'あいうえお' → ['あ', 'い', 'う', 'え', 'お']
'🇯🇵' → ['🇯', '🇵']

コードポイントが何なのかについては　MDN などを見てください。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up