1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

TiDBのMulti Valued Indexの活用

Last updated at Posted at 2024-12-21

はじめに

MySQL(TiDBも)にはJSON型があり、JSONを保存することができます。このJSON型では、JSON_CONTAINSや、JSON_OVERLAPSといった複合条件の検索に便利な関数を使うことができます。MySQLには配列型はありませんが、配列型と同じように使うこともできます。

本記事ではTiDBにおけるJSON型の利用と、それを高速化するMulti Value Indexについて説明します。この機能は比較的新しい機能で、v7.1.0でGAになったものです。

※ MySQLバージョンは こちら

利用したTiDBはTiDB Playground v8.5.0 です。

配列検索の例: ブログ記事をタグで検索する

次のようなブログ記事を保存するテーブルを考えます。

CREATE TABLE blog_articles (
    id INT AUTO_INCREMENT PRIMARY KEY,
    author VARCHAR(255) NOT NULL,
    content TEXT NOT NULL,
    tags JSON NOT NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

INSERT INTO blog_articles (author, content, tags) VALUES
('Jack Black', 'Combining Python, JavaScript, and Ruby.', '{"lang":["python","javascript","ruby"],"db":["mysql"]}'),
('Karen White', 'Using MySQL, PostgreSQL, and SQLite together.', '{"lang":["python"],"db":["mysql","postgresql","sqlite"]}'),
('Charlie Davis', 'Introduction to Rust programming.', '{"lang":["rust"],"db":["postgresql"]}'),
('Eve White', 'Using Python with SQLite.', '{"lang":["python"],"db":["sqlite"]}'),
('John Doe', 'This is the first blog post.', '{"lang":["ruby"],"db":["mysql"]}'),
('Jane Smith', 'Exploring the new features in MySQL 8.0.', '{"lang":[],"db":["mysql"]}'),
('Frank Green', 'JavaScript and MongoDB for beginners.', '{"lang":["javascript"],"db":["mongodb"]}'),
('Grace Lee', 'Building web apps with Ruby on Rails.', '{"lang":["ruby"],"db":["postgresql"]}'),
('Hank Miller', 'Django and PostgreSQL integration.', '{"lang":["python"],"db":["postgresql"]}'),
('Ivy Wilson', 'Node.js and MySQL: A perfect match.', '{"lang":["javascript"],"db":["mysql"]}');
mysql> select * from blog_articles;
+----+---------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author        | content                                       | tags                                                          | created_at          |
+----+---------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black    | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 16:19:06 |
|  2 | Karen White   | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 16:19:06 |
|  3 | Charlie Davis | Introduction to Rust programming.             | {"db": ["postgresql"], "lang": ["rust"]}                      | 2024-12-21 16:19:06 |
|  4 | Eve White     | Using Python with SQLite.                     | {"db": ["sqlite"], "lang": ["python"]}                        | 2024-12-21 16:19:06 |
|  5 | John Doe      | This is the first blog post.                  | {"db": ["mysql"], "lang": ["ruby"]}                           | 2024-12-21 16:19:06 |
|  6 | Jane Smith    | Exploring the new features in MySQL 8.0.      | {"db": ["mysql"], "lang": []}                                 | 2024-12-21 16:19:06 |
|  7 | Frank Green   | JavaScript and MongoDB for beginners.         | {"db": ["mongodb"], "lang": ["javascript"]}                   | 2024-12-21 16:19:06 |
|  8 | Grace Lee     | Building web apps with Ruby on Rails.         | {"db": ["postgresql"], "lang": ["ruby"]}                      | 2024-12-21 16:19:06 |
|  9 | Hank Miller   | Django and PostgreSQL integration.            | {"db": ["postgresql"], "lang": ["python"]}                    | 2024-12-21 16:19:06 |
| 10 | Ivy Wilson    | Node.js and MySQL: A perfect match.           | {"db": ["mysql"], "lang": ["javascript"]}                     | 2024-12-21 16:19:06 |
+----+---------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
10 rows in set (0.00 sec)

このとき、タグで絞り込みをかけたくなりますね。これをやってみましょう。

一つのタグで絞り込み

例えば、pythonを含むブログエントリを探すクエリは次のように書けます。

mysql> SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '["python"]', '$.lang');
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author      | content                                       | tags                                                          | created_at          |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black  | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 16:19:06 |
|  2 | Karen White | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 16:19:06 |
|  4 | Eve White   | Using Python with SQLite.                     | {"db": ["sqlite"], "lang": ["python"]}                        | 2024-12-21 16:19:06 |
|  9 | Hank Miller | Django and PostgreSQL integration.            | {"db": ["postgresql"], "lang": ["python"]}                    | 2024-12-21 16:19:06 |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
4 rows in set (0.00 sec)

-- または
SELECT * FROM blog_articles WHERE 'python' MEMBER OF(tags->'$.lang');
SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '{"lang":["python"]}');
-- 結果は同じなので省略

JSON_CONTAINS はJSONを指定して、そのJSONが含まれる時真を返す関数です。これの素晴らしいところは、配列要素に値が含まれていれば真を返すところです。JSONの要素は、JSON PATH形式で指定できます。
単一の値の検索でははMEMBER OFを使うこともできます。

複数のタグで絞り込み(AND)

では、pythonとmysqlに関するblogを探してみましょう。次の様に書けます

mysql> SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '{"lang":["python"],"db":["mysql"]}');
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author      | content                                       | tags                                                          | created_at          |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black  | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 16:19:06 |
|  2 | Karen White | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 16:19:06 |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
2 rows in set (0.00 sec)

検索条件をかなり直感的に記述できるのが分かると思います。

複数のタグで絞り込み(OR)

では、"python または javascript"に関するblogを抽出するにはどうすれば良いでしょうか? ORで条件をつなぐこともできますが、JSON_OVERLAPSという便利な関数もあります。これは配列要素に対して、文字通り重複部分があるかどうかを判定します。

tags->'$.lang'JSON_EXTRACT(tags,'$.lang')の省略記法で、要素を抽出するものです。ここでは言語の配列を取得するのに利用しています。

mysql> SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags->'$.lang', '["python","javascript"]');
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
| id | author      | content                                       | tags                                                          | created_at          |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
|  1 | Jack Black  | Combining Python, JavaScript, and Ruby.       | {"db": ["mysql"], "lang": ["python", "javascript", "ruby"]}   | 2024-12-21 16:19:06 |
|  2 | Karen White | Using MySQL, PostgreSQL, and SQLite together. | {"db": ["mysql", "postgresql", "sqlite"], "lang": ["python"]} | 2024-12-21 16:19:06 |
|  4 | Eve White   | Using Python with SQLite.                     | {"db": ["sqlite"], "lang": ["python"]}                        | 2024-12-21 16:19:06 |
|  7 | Frank Green | JavaScript and MongoDB for beginners.         | {"db": ["mongodb"], "lang": ["javascript"]}                   | 2024-12-21 16:19:06 |
|  9 | Hank Miller | Django and PostgreSQL integration.            | {"db": ["postgresql"], "lang": ["python"]}                    | 2024-12-21 16:19:06 |
| 10 | Ivy Wilson  | Node.js and MySQL: A perfect match.           | {"db": ["mysql"], "lang": ["javascript"]}                     | 2024-12-21 16:19:06 |
+----+-------------+-----------------------------------------------+---------------------------------------------------------------+---------------------+
6 rows in set (0.00 sec)

JSON_OVERLAPSの注意として、JSONオブジェクトに対して適用した場合は、共通のkey,valueのペアがあるかを調べるということです。valueは完全に一致しないといけません。先程の例でJSONオブジェクトで指定してみます。

mysql> SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags, '{"lang":["javascript","python"]}');
Empty set (0.00 sec)

このようになります。

同様に、"mysql"もしくは"javascript"に言及したblogを探すのにOVERLAPSを使うと期待と異なることになります。

mysql> SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags, '{"lang":["javascript"],"db":["mongodb"]}');
+----+-------------+---------------------------------------+---------------------------------------------+---------------------+
| id | author      | content                               | tags                                        | created_at          |
+----+-------------+---------------------------------------+---------------------------------------------+---------------------+
|  7 | Frank Green | JavaScript and MongoDB for beginners. | {"db": ["mongodb"], "lang": ["javascript"]} | 2024-12-21 16:19:06 |
| 10 | Ivy Wilson  | Node.js and MySQL: A perfect match.   | {"db": ["mysql"], "lang": ["javascript"]}   | 2024-12-21 16:19:06 |
+----+-------------+---------------------------------------+---------------------------------------------+---------------------+
2 rows in set (0.00 sec)

一見よさそうですが、{"db": ["mysql"], "lang": ["python", "javascript", "ruby"]} のようにlangに複数要素があるものがヒットしません。

OVELAPSは配列に対して利用するのが良さそうですね。

Multi Value Indexes

さて、このような検索の効率はどうなのでしょうか。普通に考えるとFull Scanとなります。

mysql> explain SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags->'$.lang','["python"]');
+-------------------------+---------+-----------+---------------------+-------------------------------------------------------------------------------------------------+
| id                      | estRows | task      | access object       | operator info                                                                                   |
+-------------------------+---------+-----------+---------------------+-------------------------------------------------------------------------------------------------+
| TableReader_7           | 8.00    | root      |                     | data:Selection_6                                                                                |
| └─Selection_6           | 8.00    | cop[tikv] |                     | json_contains(json_extract(test.blog_articles.tags, "$.lang"), cast("["python"]", json BINARY)) |
|   └─TableFullScan_5     | 10.00   | cop[tikv] | table:blog_articles | keep order:false, stats:pseudo                                                                  |
+-------------------------+---------+-----------+---------------------+-------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

予想通りです。

TiDBにはこのような配列に利用できる、Multi Valued Indexesというインデックスがあります。

早速作ってみます。

mysql> CREATE INDEX idx_lang ON blog_articles((CAST(tags->'$.lang' AS char(20) ARRAY)));
Query OK, 0 rows affected (1.97 sec)

先に出てきたいくつかの例を試してみましょう。

mysql> explain SELECT * FROM blog_articles WHERE 'python' MEMBER OF(tags->'$.lang');
+-------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------+-----------------------------------------------------------+
| id                            | estRows | task      | access object                                                                                       | operator info                                             |
+-------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------+-----------------------------------------------------------+
| IndexMerge_10                 | 0.01    | root      |                                                                                                     | type: union                                               |
| ├─IndexRangeScan_8(Build)     | 0.01    | cop[tikv] | table:blog_articles, index:idx_lang(cast(json_extract(`tags`, _utf8mb4'$.lang') as char(20) array)) | range:["python","python"], keep order:false, stats:pseudo |
| └─TableRowIDScan_9(Probe)     | 0.01    | cop[tikv] | table:blog_articles                                                                                 | keep order:false, stats:pseudo                            |
+-------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------+-----------------------------------------------------------+
3 rows in set (0.01 sec)

-- 同じ
explain SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags->'$.lang','["python"]');

-- OVERLAPSもインデックスを利用する
mysql> explain SELECT * FROM blog_articles WHERE JSON_OVERLAPS(tags->'$.lang','["python","javascript"]');
+----------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
| id                               | estRows | task      | access object                                                                                       | operator info                                                                                                |
+----------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
| Selection_5                      | 0.02    | root      |                                                                                                     | json_overlaps(json_extract(test.blog_articles.tags, "$.lang"), cast("["python","javascript"]", json BINARY)) |
| └─IndexMerge_11                  | 0.02    | root      |                                                                                                     | type: union                                                                                                  |
|   ├─IndexRangeScan_8(Build)      | 0.01    | cop[tikv] | table:blog_articles, index:idx_lang(cast(json_extract(`tags`, _utf8mb4'$.lang') as char(20) array)) | range:["python","python"], keep order:false, stats:pseudo                                                    |
|   ├─IndexRangeScan_9(Build)      | 0.01    | cop[tikv] | table:blog_articles, index:idx_lang(cast(json_extract(`tags`, _utf8mb4'$.lang') as char(20) array)) | range:["javascript","javascript"], keep order:false, stats:pseudo                                            |
|   └─TableRowIDScan_10(Probe)     | 0.02    | cop[tikv] | table:blog_articles                                                                                 | keep order:false, stats:pseudo                                                                               |
+----------------------------------+---------+-----------+-----------------------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------+
5 rows in set, 1 warning (0.01 sec)

-- オブジェクトに対しては利用されない。残念。
mysql> explain SELECT * FROM blog_articles WHERE JSON_CONTAINS(tags, '{"lang":["python"],"db":["mysql"]}');
+-------------------------+---------+-----------+---------------------+-------------------------------------------------------------------------------------------------+
| id                      | estRows | task      | access object       | operator info                                                                                   |
+-------------------------+---------+-----------+---------------------+-------------------------------------------------------------------------------------------------+
| TableReader_7           | 8.00    | root      |                     | data:Selection_6                                                                                |
| └─Selection_6           | 8.00    | cop[tikv] |                     | json_contains(test.blog_articles.tags, cast("{"lang":["python"],"db":["mysql"]}", json BINARY)) |
|   └─TableFullScan_5     | 10.00   | cop[tikv] | table:blog_articles | keep order:false, stats:pseudo                                                                  |
+-------------------------+---------+-----------+---------------------+-------------------------------------------------------------------------------------------------+
3 rows in set (0.00 sec)

--

このように、JSON関数に対してindexが機能するのがわかります。特にJSON型に大きなオブジェクトを保管しているときなどは、Full Table Scanとの差は大きくなるでしょう。

TiDBの場合、Multi Valued Indexesはインデックスマージという機能を利用して実現されています。これはOVERLAPSの例をみると分かるように、インデックスを複数回ひいてその結果をマージしているものです。

おわりに

TiDBにおけるJSON型/JSON配列を使う例について紹介しました。配列のような可変長項目や、複数選択がある選択肢による抽出などに効果がありそうです。

MySQLとの違いについても参考にしていただけると幸いです。

1
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
1
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?