23
Help us understand the problem. What are the problem?

More than 5 years have passed since last update.

posted at

updated at

MySQLで文字列型を検索するときのハマりポイントと、その解決方法

この稿で説明すること

※この稿では、MySQL5.5について説明する。

MySQLの文字列型のうちbinary以外の文字セットは、デフォルトの設定で以下2つの仕様がある。

  1. 比較の際に大文字・小文字を区別しない
  2. 比較の際に末尾の半角スペースを考慮しない

これらの仕様は、アプリケーションを作る際に不都合なことがある(と言うか、たいていの場合バグの原因になる)ため、その解決方法についてのアイデアを提案する。

1. 比較の際に大文字・小文字を区別しない

MySQLがサポートしている文字セットを確認してみると、binary以外は"Default collation"の末尾に"_ci"が付いている。
"_ci"はcase insensitiveの略で、比較の際に大文字・小文字を区別しない。
https://dev.mysql.com/doc/refman/5.5/en/charset-charsets.html
参考 : https://dev.mysql.com/doc/refman/5.6/ja/charset-charsets.html

mysql> SHOW CHARACTER SET;
+----------+-----------------------------+---------------------+--------+
| Charset  | Description                 | Default collation   | Maxlen |
+----------+-----------------------------+---------------------+--------+
| big5     | Big5 Traditional Chinese    | big5_chinese_ci     |      2 |
| dec8     | DEC West European           | dec8_swedish_ci     |      1 |
| cp850    | DOS West European           | cp850_general_ci    |      1 |
| hp8      | HP West European            | hp8_english_ci      |      1 |
| koi8r    | KOI8-R Relcom Russian       | koi8r_general_ci    |      1 |
| latin1   | cp1252 West European        | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European | latin2_general_ci   |      1 |
| swe7     | 7bit Swedish                | swe7_swedish_ci     |      1 |
| ascii    | US ASCII                    | ascii_general_ci    |      1 |
| ujis     | EUC-JP Japanese             | ujis_japanese_ci    |      3 |
| sjis     | Shift-JIS Japanese          | sjis_japanese_ci    |      2 |
| hebrew   | ISO 8859-8 Hebrew           | hebrew_general_ci   |      1 |
| tis620   | TIS620 Thai                 | tis620_thai_ci      |      1 |
| euckr    | EUC-KR Korean               | euckr_korean_ci     |      2 |
| koi8u    | KOI8-U Ukrainian            | koi8u_general_ci    |      1 |
| gb2312   | GB2312 Simplified Chinese   | gb2312_chinese_ci   |      2 |
| greek    | ISO 8859-7 Greek            | greek_general_ci    |      1 |
| cp1250   | Windows Central European    | cp1250_general_ci   |      1 |
| gbk      | GBK Simplified Chinese      | gbk_chinese_ci      |      2 |
| latin5   | ISO 8859-9 Turkish          | latin5_turkish_ci   |      1 |
| armscii8 | ARMSCII-8 Armenian          | armscii8_general_ci |      1 |
| utf8     | UTF-8 Unicode               | utf8_general_ci     |      3 |
| ucs2     | UCS-2 Unicode               | ucs2_general_ci     |      2 |
| cp866    | DOS Russian                 | cp866_general_ci    |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak  | keybcs2_general_ci  |      1 |
| macce    | Mac Central European        | macce_general_ci    |      1 |
| macroman | Mac West European           | macroman_general_ci |      1 |
| cp852    | DOS Central European        | cp852_general_ci    |      1 |
| latin7   | ISO 8859-13 Baltic          | latin7_general_ci   |      1 |
| utf8mb4  | UTF-8 Unicode               | utf8mb4_general_ci  |      4 |
| cp1251   | Windows Cyrillic            | cp1251_general_ci   |      1 |
| utf16    | UTF-16 Unicode              | utf16_general_ci    |      4 |
| cp1256   | Windows Arabic              | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic              | cp1257_general_ci   |      1 |
| utf32    | UTF-32 Unicode              | utf32_general_ci    |      4 |
| binary   | Binary pseudo charset       | binary              |      1 |
| geostd8  | GEOSTD8 Georgian            | geostd8_general_ci  |      1 |
| cp932    | SJIS for Windows Japanese   | cp932_japanese_ci   |      2 |
| eucjpms  | UJIS for Windows Japanese   | eucjpms_japanese_ci |      3 |
+----------+-----------------------------+---------------------+--------+
39 rows in set (0.00 sec)

解決方法 : 照合順序を変更する

例えば、文字セットcp932にはcp932_japanese_cicp932_binの2つの照合順序がある。

mysql> SHOW COLLATION WHERE `Charset` = 'cp932';
+-------------------+---------+----+---------+----------+---------+
| Collation         | Charset | Id | Default | Compiled | Sortlen |
+-------------------+---------+----+---------+----------+---------+
| cp932_japanese_ci | cp932   | 95 | Yes     | Yes      |       1 |
| cp932_bin         | cp932   | 96 |         | Yes      |       1 |
+-------------------+---------+----+---------+----------+---------+
2 rows in set (0.00 sec)

このうちcp932_binを使用すると、大文字と小文字を考慮した比較を行なうことができる。
cp932_binは非デフォルトの照合順序のため、明示的に指定してやる必要がある。

mysql> CREATE TABLE `collation_test` (
    ->   `col_char` char(64) DEFAULT NULL,
    ->   `col_varchar` varchar(64) DEFAULT NULL,
    ->   `col_text` text,
    ->   `col_char_bin` char(64) CHARACTER SET cp932 COLLATE cp932_bin DEFAULT NULL,
    ->   `col_varchar_bin` varchar(64) CHARACTER SET cp932 COLLATE cp932_bin DEFAULT NULL,
    ->   `col_text_bin` text CHARACTER SET cp932 COLLATE cp932_bin
    -> ) ENGINE=InnoDB DEFAULT CHARSET=cp932;
Query OK, 0 rows affected (0.02 sec)

mysql> INSERT INTO `collation_test` VALUES ('abc', 'abc', 'abc', 'abc', 'abc', 'abc');
Query OK, 1 row affected (0.00 sec)


mysql> SELECT * FROM `collation_test` WHERE `col_char` = 'ABC';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_varchar` = 'ABC';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_text` = 'ABC';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_char_bin` = 'ABC';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_varchar_bin` = 'ABC';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_text_bin` = 'ABC';
Empty set (0.00 sec)

照合順序がデフォルトのcp932_japanese_ciに設定されているカラムは比較の際に大文字・小文字を区別しないのに対して、cp932_binに設定したカラムは、大文字・小文字を区別していることが分かる。

2. 比較の際に末尾の半角スペースを考慮しない

以下のマニュアルに記載されている通り、MySQLの照合順序は末尾のスペース(半角スペース)を考慮せず、しかもその動作はMySQLの設定で変更することができない。

All MySQL collations are of type PADSPACE. This means that all CHAR, VARCHAR, and TEXT values in MySQL are compared without regard to any trailing spaces.
...
This is true for all MySQL versions, and is not affected by the server SQL mode.

https://dev.mysql.com/doc/refman/5.5/en/char.html
参考 : https://dev.mysql.com/doc/refman/5.6/ja/char.html

解決方法 : LIKE句で検索する

実は、先ほどのマニュアルには以下の記述もある。

“Comparison” in this context does not include the LIKE pattern-matching operator, for which trailing spaces are significant.

LIKE句を使用すると、末尾のスペースを考慮して比較することができる。

mysql> SELECT * FROM `collation_test` WHERE `col_char` = 'abc            ';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_varchar` = 'abc            ';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_text` = 'abc            ';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_char` LIKE 'abc            ';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_varchar` LIKE 'abc            ';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_text` LIKE 'abc            ';
Empty set (0.00 sec)

問題点

LIKE句を使用する場合は、"%"と"_"をエスケープする必要がある。

まとめると…

カラムの照合順序を変更し、"%"と"_"をエスケープしてLIKE句で検索する。
これで、MySQLでも大文字と小文字を区別し、かつ末尾のスペースを考慮した比較を行なうことができるようになる。

もう一つの解決方法

BINARY演算子を使用すると、"1."と"2."の両方を同時に解決することができる。
http://dev.mysql.com/doc/refman/5.5/en/cast-functions.html
参考 : http://dev.mysql.com/doc/refman/5.6/ja/cast-functions.html

mysql> SELECT * FROM `collation_test` WHERE `col_char` = BINARY 'abc';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_varchar` = BINARY 'abc';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_text` = BINARY 'abc';
+----------+-------------+----------+--------------+-----------------+--------------+
| col_char | col_varchar | col_text | col_char_bin | col_varchar_bin | col_text_bin |
+----------+-------------+----------+--------------+-----------------+--------------+
| abc      | abc         | abc      | abc          | abc             | abc          |
+----------+-------------+----------+--------------+-----------------+--------------+
1 row in set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_char` = BINARY 'ABC';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_varchar` = BINARY 'ABC';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_text` = BINARY 'ABC';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_char` = BINARY 'abc            ';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_varchar` = BINARY 'abc            ';
Empty set (0.00 sec)

mysql> SELECT * FROM `collation_test` WHERE `col_text` = BINARY 'abc            ';
Empty set (0.00 sec)

しかも、この方法なら"%"と"_"をエスケープする必要がない。

注意事項

BINARY演算子はカラム名と比較対象文字のどちらにも付けることができるが、カラム名に付けた場合、INDEXが効かなくなる。

mysql> ALTER TABLE `collation_test` ADD INDEX (`col_char`), ADD INDEX (`col_varchar`), ADD INDEX (`col_text`(64));
Query OK, 0 rows affected (0.15 sec)
Records: 0  Duplicates: 0  Warnings: 0


mysql> EXPLAIN SELECT * FROM `collation_test` WHERE `col_char` = BINARY 'abc';
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table          | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | collation_test | range | col_char      | col_char | 129     | NULL |    1 | Using where |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM `collation_test` WHERE `col_varchar` = BINARY 'abc';
+----+-------------+----------------+-------+---------------+-------------+---------+------+------+-------------+
| id | select_type | table          | type  | possible_keys | key         | key_len | ref  | rows | Extra       |
+----+-------------+----------------+-------+---------------+-------------+---------+------+------+-------------+
|  1 | SIMPLE      | collation_test | range | col_varchar   | col_varchar | 131     | NULL |    1 | Using where |
+----+-------------+----------------+-------+---------------+-------------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM `collation_test` WHERE `col_text` = BINARY 'abc';
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-------------+
| id | select_type | table          | type  | possible_keys | key      | key_len | ref  | rows | Extra       |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-------------+
|  1 | SIMPLE      | collation_test | range | col_text      | col_text | 131     | NULL |    1 | Using where |
+----+-------------+----------------+-------+---------------+----------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM `collation_test` WHERE BINARY `col_char` = 'abc';
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | collation_test | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM `collation_test` WHERE BINARY `col_varchar` = 'abc';
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | collation_test | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)

mysql> EXPLAIN SELECT * FROM `collation_test` WHERE BINARY `col_text` = 'abc';
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
| id | select_type | table          | type | possible_keys | key  | key_len | ref  | rows | Extra       |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
|  1 | SIMPLE      | collation_test | ALL  | NULL          | NULL | NULL    | NULL |    1 | Using where |
+----+-------------+----------------+------+---------------+------+---------+------+------+-------------+
1 row in set (0.00 sec)
Why not register and get more from Qiita?
  1. We will deliver articles that match you
    By following users and tags, you can catch up information on technical fields that you are interested in as a whole
  2. you can read useful information later efficiently
    By "stocking" the articles you like, you can search right away
Sign upLogin
23
Help us understand the problem. What are the problem?