2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

TiDBAdvent Calendar 2024

Day 20

履歴テーブルから最新の1件を取ってくる方法 (TiDB)

Posted at

はじめに

こちらのそーだいさんの記事にある、履歴テーブルから最新の1件を取ってくる方法のTiDB(MySQL)版です。

TiDB/MySQLは特殊な関数がないので、オーソドックスにWindow関数を使ったやり方を試してみます。
それだけだと面白くないので、本記事ではより大きなデータセットを用いて、実行計画の改善を試し、改善度合いを見てみることにします。また、これはTiFlash(HTAP)のアクセラレーションの良い例でもあるので、TiFlashも使ってみました。

準備

まずは、大量のデータセットを用意します。テーブル定義はそーだいさんのブログに記載のテーブル定義をそのまま利用し、それにあわせたデータを作成するプログラムを作成しました。

言語は何でもいいと思いますが、ダミーデータを作成するライブラリがあると楽です。ここではgo言語の gofakeit を使いました。

大量データの場合、多くのケースにおいて直接INSERTするよりCSV等をバルクインサートする方が速いです。
このプログラムもCSVを書き出しているだけです。

main.go
package main

import (
	"encoding/csv"
	"fmt"
	"math/rand"
	"os"
	"strconv"
	"time"

	"github.com/brianvoe/gofakeit/v7"
)

func main() {
	if len(os.Args) != 5 {
		fmt.Println("Usage: datagen <seed> <number_of_rows> <max_id> <output_file>")
		return
	}

	seed, err := strconv.ParseInt(os.Args[1], 10, 64)
	if err != nil {
		fmt.Println("Invalid seed:", os.Args[1])
		return
	}

	numRows, err := strconv.Atoi(os.Args[2])
	if err != nil {
		fmt.Println("Invalid number of rows:", os.Args[2])
		return
	}

	maxID, err := strconv.Atoi(os.Args[3])
	if err != nil {
		fmt.Println("Invalid max_id:", os.Args[3])
		return
	}

	outputFile := os.Args[4]

	gofakeit.Seed(seed)

	file, err := os.Create(outputFile)
	if err != nil {
		fmt.Println("Could not create file:", outputFile)
		return
	}
	defer file.Close()

	writer := csv.NewWriter(file)
	defer writer.Flush()

	// Write CSV header
	writer.Write([]string{"id", "user_id", "data", "created_at"})

	for i := 0; i < numRows; i++ {
		id := i + 1
		user_id := rand.Intn(maxID) + 1
		data := gofakeit.Sentence(10)
		created_at := gofakeit.Date().Format(time.RFC3339)

		writer.Write([]string{strconv.Itoa(id), strconv.Itoa(user_id), data, created_at})
	}
}

とりあえずこれで10000ユーザー/1千万行くらいのCSVファイルを生成します。900MB程度ありますが、M1Macでの生成時間は数分といったところです。これをS3に保存します。

TiDB Cloud側であらかじめテーブルを作成しておきます。テーブル定義はそーだいさんのブログのものをそのまま利用しました。

history.sql
-- MySQL
CREATE TABLE history (
    id INT AUTO_INCREMENT PRIMARY KEY,
    user_id INT NOT NULL,
    data TEXT,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
);

このテーブルに対し、S3からデータをimportしておきます。これで準備は完了です。

ベースラインのクエリパフォーマンス

では、何もせずそのままの状態でクエリを実行してみることにします。10000ユーザーのそれぞれの最新の1件を取得するクエリです。10000件出力すると長いので、100件にしぼります。

window 関数

window.sql
SELECT *
FROM (SELECT history.*,
             ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS ranking
      FROM history) latest
WHERE ranking = 1
limit 100;

+---------+---------+---------------------------------------------------------------------------------+---------------------+---------+
| id      | user_id | data                                                                            | created_at          | ranking |
+---------+---------+---------------------------------------------------------------------------------+---------------------+---------+
| 8644123 |       3 | You some however enchanted example troop riches Eastern this along.             | 2024-12-11 20:17:55 |       1 |
| 4604397 |      14 | It is what recently colorful out dynasty water besides very.                    | 2024-12-07 09:03:51 |       1 |
| 8960784 |      24 | Tense whereas Italian yearly first sandwich yourself data it about.             | 2024-12-29 09:05:30 |       1 |
...

| 4613743 |     494 | Example always you each occasionally out down troupe hers thoughtfully.         | 2024-11-15 03:28:30 |       1 |
| 7343243 |     498 | Result those many joy his were today in you substantial.                        | 2024-12-30 11:11:45 |       1 |
| 7143736 |     501 | Plane where often in upon across batch whom point near.                         | 2024-11-19 03:43:41 |       1 |
+---------+---------+---------------------------------------------------------------------------------+---------------------+---------+
100 rows in set (23.42 sec)

23.42秒かかっています。

実行計画も見てみます。

mysql> explain analyze SELECT * FROM (SELECT history.*,              ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS ranking       FROM history) latest WHERE ranking = 1 limit 100;
+------------------------------------+-------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| id                                 | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                             | operator info                                                                                                                                           | memory   | disk    |
+------------------------------------+-------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| Limit_14                           | 100.00      | 100      | root      |               | time:16.6s, loops:2, RU:20481.047892                                                                                                                                                                                                                                                                                                                                                                                                                       | offset:?, count:?                                                                                                                                       | N/A      | N/A     |
| └─Selection_15                     | 100.00      | 100      | root      |               | time:16.6s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                        | eq(Column#6, ?)                                                                                                                                         | 122.2 KB | N/A     |
|   └─Shuffle_20                     | 100.00      | 99328    | root      |               | time:16.6s, loops:97, ShuffleConcurrency:5                                                                                                                                                                                                                                                                                                                                                                                                                 | execution info: concurrency:5, data sources:[TableReader_18]                                                                                            | N/A      | N/A     |
|     └─Window_16                    | 100.00      | 103424   | root      |               | time:1m41.4s, loops:102                                                                                                                                                                                                                                                                                                                                                                                                                                    | row_number()->Column#6 over(partition by firsthist.history.user_id order by firsthist.history.created_at desc rows between current row and current row) | N/A      | N/A     |
|       └─Sort_19                    | 10000000.00 | 107520   | root      |               | time:1m41.4s, loops:106                                                                                                                                                                                                                                                                                                                                                                                                                                    | firsthist.history.user_id, firsthist.history.created_at:desc                                                                                            | 248.3 MB | 0 Bytes |
|         └─ShuffleReceiver_21       | 10000000.00 | 10000000 | root      |               | time:1m8.7s, loops:9774                                                                                                                                                                                                                                                                                                                                                                                                                                    |                                                                                                                                                         | N/A      | N/A     |
|           └─TableReader_18         | 10000000.00 | 10000000 | root      |               | time:53.5ms, loops:9782, cop_task: {num: 291, max: 68.6ms, min: 1.05ms, avg: 39.1ms, p95: 59.3ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 8.71s, tot_wait: 25ms, copr_cache: disabled, build_task_duration: 31.5µs, max_distsql_concurrency: 13}, rpc_info:{Cop:{num_rpc:291, total_time:11.4s}}                                                                                                                                             | data:TableFullScan_17                                                                                                                                   | 49.6 MB  | N/A     |
|             └─TableFullScan_17     | 10000000.00 | 10000000 | cop[tikv] | table:history | tikv_task:{proc max:50ms, min:0s, avg: 26.3ms, p80:40ms, p95:50ms, iters:10920, tasks:291}, scan_detail: {total_process_keys: 10000000, total_process_keys_size: 1142807139, total_keys: 10000291, get_snapshot_time: 18.4ms, rocksdb: {key_skipped_count: 10000000, block: {cache_hit_count: 39879}}}, time_detail: {total_process_time: 8.71s, total_suspend_time: 18.6ms, total_wait_time: 25ms, total_kv_read_wall_time: 7.64s, tikv_wall_time: 8.79s} | keep order:false                                                                                                                                        | N/A      | N/A     |
+------------------------------------+-------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
8 rows in set (22.65 sec)

ここで、cop[tikv] はTiKVレイヤーの処理を、root はTiDBレイヤーでの処理を示しています。全量TiKVから読み出して、TiDBサーバ側でフィルタ処理をしているようです。元のSQLからみても想像がつく実行計画ではないでしょうか。

集計関数

今回のように、日付型など大小比較が可能なものについては、集計演算を使う方法もあります(Window関数がそれほど広まってなかったときは、こちらのやり方が主流だったと思います)
ただこちらは厳密には先のwindow関数とは異なっていて、最新のレコードが複数あった場合、複数出てきてしまいます。

max.sql
select h1.*
from history h1 join 
  (select  user_id, max(created_at) as latest from history group by user_id) h2 
on h1.created_at = h2.latest and h1.user_id = h2.user_id
limit 100;

+---------+---------+---------------------------------------------------------------------------------+---------------------+
| id      | user_id | data                                                                            | created_at          |
+---------+---------+---------------------------------------------------------------------------------+---------------------+
| 1894623 |    2150 | Nevertheless of will an hers late it anyway tomorrow so.                        | 2024-12-05 09:49:01 |
| 5684905 |    9067 | It how nest place he rather muster electricity soon where.                      | 2024-07-12 18:13:00 |
|  948625 |     878 | Party this caravan untie mob Congolese these turn kiss library.                 | 2024-12-04 09:58:19 |
...
|  999449 |    5354 | Gossip heavily child lean may next class hmm it others.                         | 2024-12-10 01:56:49 |
|  999484 |    2252 | It dance hundreds respects regularly from had tonight that where.               | 2024-12-15 12:05:14 |
| 1003066 |    9095 | Now you then handsome consequently neither hence oops daily whenever.           | 2024-11-30 04:09:12 |
+---------+---------+---------------------------------------------------------------------------------+---------------------+
100 rows in set (1.43 sec)

1.43秒です。なんか間違ってるんじゃないかと思ってcount(distinct)してみましたが、結果は正しかったです。

こちらも実行計画をみてみます。

mysql> explain analyze select h1.* from history h1 join    (select  user_id, max(created_at) as latest from history group by user_id) h2  on h1.created_at = h2.latest and h1.user_id = h2.user_id limit 100;
+------------------------------------+-------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| id                                 | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                             | operator info                                                                                                                            | memory   | disk    |
+------------------------------------+-------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| Limit_16                           | 100.00      | 100      | root      |               | time:1.39s, loops:2, RU:30174.050516                                                                                                                                                                                                                                                                                                                                                                                                                       | offset:?, count:?                                                                                                                        | N/A      | N/A     |
| └─HashJoin_17                      | 100.00      | 1024     | root      |               | time:1.39s, loops:1, build_hash_table:{total:806.9ms, fetch:804ms, build:2.93ms}, probe:{concurrency:5, total:6.96s, max:1.39s, probe:952.7ms, fetch and wait:6.01s}                                                                                                                                                                                                                                                                                       | inner join, equal:[eq(firsthist.history.created_at, Column#9) eq(firsthist.history.user_id, firsthist.history.user_id)]                  | 878.7 KB | 0 Bytes |
|   ├─Selection_21(Build)            | 8000.00     | 10000    | root      |               | time:804.4ms, loops:11                                                                                                                                                                                                                                                                                                                                                                                                                                     | not(isnull(Column#9))                                                                                                                    | 19.0 KB  | N/A     |
|    └─HashAgg_26                   | 10000.00    | 10000    | root      |               | time:803.7ms, loops:12, partial_worker:{wall_time:801.523879ms, concurrency:5, task_num:88, tot_wait:3.896902186s, tot_exec:110.531283ms, tot_time:4.007506382s, max:801.508052ms, p95:801.508052ms}, final_worker:{wall_time:804.546583ms, concurrency:5, task_num:25, tot_wait:17.448µs, tot_exec:9.896434ms, tot_time:4.020676143s, max:804.534267ms, p95:804.534267ms}                                                                                 | group by:firsthist.history.user_id, funcs:max(Column#10)->Column#9, funcs:firstrow(firsthist.history.user_id)->firsthist.history.user_id | 12.5 MB  | N/A     |
|      └─TableReader_27             | 10000.00    | 332777   | root      |               | time:798.6ms, loops:89, cop_task: {num: 90, max: 721.9ms, min: 1.36ms, avg: 90ms, p95: 713.4ms, max_proc_keys: 915864, p95_proc_keys: 915737, tot_proc: 7.75s, tot_wait: 14.9ms, copr_cache: disabled, build_task_duration: 15.7µs, max_distsql_concurrency: 13}, rpc_info:{Cop:{num_rpc:90, total_time:8.09s}}                                                                                                                                            | data:HashAgg_22                                                                                                                          | 340.6 KB | N/A     |
|        └─HashAgg_22               | 10000.00    | 332777   | cop[tikv] |               | tikv_task:{proc max:720ms, min:0s, avg: 86ms, p80:10ms, p95:710ms, iters:9774, tasks:90}, scan_detail: {total_process_keys: 10000000, total_process_keys_size: 1142807139, total_keys: 10000090, get_snapshot_time: 13.2ms, rocksdb: {key_skipped_count: 10000000, block: {cache_hit_count: 39153}}}, time_detail: {total_process_time: 7.75s, total_suspend_time: 14.2ms, total_wait_time: 14.9ms, total_kv_read_wall_time: 6.75s, tikv_wall_time: 7.79s} | group by:firsthist.history.user_id, funcs:max(firsthist.history.created_at)->Column#10                                                   | N/A      | N/A     |
|          └─TableFullScan_25       | 10000000.00 | 10000000 | cop[tikv] | table:history | tikv_task:{proc max:649ms, min:0s, avg: 75ms, p80:10ms, p95:630ms, iters:9774, tasks:90}                                                                                                                                                                                                                                                                                                                                                                   | keep order:false                                                                                                                         | N/A      | N/A     |
|   └─TableReader_20(Probe)          | 125000.00   | 4875872  | root      |               | time:402ms, loops:4767, cop_task: {num: 183, max: 71.1ms, min: 983µs, avg: 34ms, p95: 66.5ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 4.4s, tot_wait: 20.5ms, copr_cache: disabled, build_task_duration: 34.1µs, max_distsql_concurrency: 13}, rpc_info:{Cop:{num_rpc:183, total_time:6.22s}}                                                                                                                                                | data:TableFullScan_19                                                                                                                    | 18.0 MB  | N/A     |
|     └─TableFullScan_19             | 125000.00   | 4905568  | cop[tikv] | table:h1      | tikv_task:{proc max:50ms, min:0s, avg: 20.8ms, p80:40ms, p95:50ms, iters:5503, tasks:183}, scan_detail: {total_process_keys: 4905568, total_process_keys_size: 560603018, total_keys: 4905751, get_snapshot_time: 16.3ms, rocksdb: {key_skipped_count: 4905568, block: {cache_hit_count: 19694}}}, time_detail: {total_process_time: 4.4s, total_suspend_time: 9.64ms, total_wait_time: 20.5ms, total_kv_read_wall_time: 3.81s, tikv_wall_time: 4.49s}     | keep order:false                                                                                                                         | N/A      | N/A     |
+------------------------------------+-------------+----------+-----------+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
9 rows in set (1.40 sec)

こちらは、サブクエリ部分の集計演算と、それを元にHashJoinを行う二段階に分かれています。TiKVは集計演算をpushdownで処理できる機能を備えており、集計演算は効率的に分散処理されます。HashJoinも同様に分散処理されます。

このようにうまくTiKVに処理を散らすことで、高速に処理ができたのではないかと考えられます。

パフォーマンス改善

次に、パフォーマンスの改善方法について考えてみます。今回の例は created_at で最新のものを取るという簡単な検索条件ですので、そーだいさんのブログで言及されているように、インデックスの効果がありそうです。
とりあえずどちらのクエリでも使われている {created_at, user_id} のインデックスを作ってみます。

index.sql
mysql> alter table history add index idx_created_at_user_id (created_at, user_id);
Query OK, 0 rows affected (48.05 sec)

window関数

window関数の実行計画に変更はなく、実行時間も変わりません。

mysql> explain analyze SELECT * FROM (SELECT history.*,              ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS ranking       FROM history) latest WHERE ranking = 1 limit 100;
+------------------------------------+-------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| id                                 | estRows     | actRows  | task      | access object | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                               | operator info                                                                                                                                           | memory   | disk    |
+------------------------------------+-------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
| Limit_14                           | 100.00      | 100      | root      |               | time:17s, loops:2, RU:20467.833327                                                                                                                                                                                                                                                                                                                                                                                                                           | offset:?, count:?                                                                                                                                       | N/A      | N/A     |
| └─Selection_15                     | 100.00      | 100      | root      |               | time:17s, loops:1                                                                                                                                                                                                                                                                                                                                                                                                                                            | eq(Column#6, ?)                                                                                                                                         | 122.2 KB | N/A     |
|   └─Shuffle_20                     | 100.00      | 99328    | root      |               | time:17s, loops:97, ShuffleConcurrency:5                                                                                                                                                                                                                                                                                                                                                                                                                     | execution info: concurrency:5, data sources:[TableReader_18]                                                                                            | N/A      | N/A     |
|     └─Window_16                    | 100.00      | 103424   | root      |               | time:1m44.4s, loops:102                                                                                                                                                                                                                                                                                                                                                                                                                                      | row_number()->Column#6 over(partition by firsthist.history.user_id order by firsthist.history.created_at desc rows between current row and current row) | N/A      | N/A     |
|       └─Sort_19                    | 10000000.00 | 107520   | root      |               | time:1m44.4s, loops:106                                                                                                                                                                                                                                                                                                                                                                                                                                      | firsthist.history.user_id, firsthist.history.created_at:desc                                                                                            | 248.3 MB | 0 Bytes |
|         └─ShuffleReceiver_21       | 10000000.00 | 10000000 | root      |               | time:1m9.8s, loops:9774                                                                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                                                                                         | N/A      | N/A     |
|           └─TableReader_18         | 10000000.00 | 10000000 | root      |               | time:49ms, loops:9782, cop_task: {num: 291, max: 63.9ms, min: 483.8µs, avg: 38.8ms, p95: 58.9ms, max_proc_keys: 50144, p95_proc_keys: 50144, tot_proc: 8.68s, tot_wait: 11.9ms, copr_cache: disabled, build_task_duration: 34.6µs, max_distsql_concurrency: 13}, rpc_info:{Cop:{num_rpc:291, total_time:11.3s}}                                                                                                                                              | data:TableFullScan_17                                                                                                                                   | 49.6 MB  | N/A     |
|             └─TableFullScan_17     | 10000000.00 | 10000000 | cop[tikv] | table:history | tikv_task:{proc max:50ms, min:0s, avg: 26.7ms, p80:40ms, p95:50ms, iters:10920, tasks:291}, scan_detail: {total_process_keys: 10000000, total_process_keys_size: 1142807139, total_keys: 10000291, get_snapshot_time: 5.32ms, rocksdb: {key_skipped_count: 10000000, block: {cache_hit_count: 39879}}}, time_detail: {total_process_time: 8.68s, total_suspend_time: 18.9ms, total_wait_time: 11.9ms, total_kv_read_wall_time: 7.78s, tikv_wall_time: 8.74s} | keep order:false                                                                                                                                        | N/A      | N/A     |
+------------------------------------+-------------+----------+-----------+---------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------+----------+---------+
8 rows in set (23.38 sec)

集計関数

集計関数では、作成したインデックスが利用されていました。

mysql> alter table history drop index idx_user_id_created_at;
Query OK, 0 rows affected (0.55 sec)

mysql> explain analyze select h1.* from history h1 join    (select  user_id, max(created_at) as latest from history group by user_id) h2  on h1.created_at = h2.latest and h1.user_id = h2.user_id limit 100;
+------------------------------------+-------------+----------+-----------+------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                                 | estRows     | actRows  | task      | access object                                                    | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | operator info                                                                                                                                                                                                                                                       | memory   | disk |
+------------------------------------+-------------+----------+-----------+------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| Limit_16                           | 100.00      | 100      | root      |                                                                  | time:1.13s, loops:2, RU:11177.269251                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | offset:?, count:?                                                                                                                                                                                                                                                   | N/A      | N/A  |
| └─IndexHashJoin_22                 | 100.00      | 512      | root      |                                                                  | time:1.13s, loops:1, inner:{total:122.5ms, concurrency:5, task:8, construct:13ms, fetch:108.7ms, build:2.3ms, join:777.7µs}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | inner join, inner:IndexLookUp_19, outer key:Column#9, firsthist.history.user_id, inner key:firsthist.history.created_at, firsthist.history.user_id, equal cond:eq(Column#9, firsthist.history.created_at), eq(firsthist.history.user_id, firsthist.history.user_id) | 3.53 MB  | N/A  |
|   ├─Selection_30(Build)            | 100.00      | 10000    | root      |                                                                  | time:1.1s, loops:15                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | not(isnull(Column#9))                                                                                                                                                                                                                                               | 19.0 KB  | N/A  |
|    └─HashAgg_36                   | 100.00      | 10000    | root      |                                                                  | time:1.1s, loops:13, partial_worker:{wall_time:1.10178037s, concurrency:5, task_num:98, tot_wait:5.395027701s, tot_exec:113.363927ms, tot_time:5.508474641s, max:1.101718142s, p95:1.101718142s}, final_worker:{wall_time:1.10493501s, concurrency:5, task_num:25, tot_wait:16.524µs, tot_exec:10.326862ms, tot_time:5.522236628s, max:1.104834178s, p95:1.104834178s}                                                                                                                                                                                                                                                                                                                                                                                                             | group by:firsthist.history.user_id, funcs:max(Column#10)->Column#9, funcs:firstrow(firsthist.history.user_id)->firsthist.history.user_id                                                                                                                            | 12.5 MB  | N/A  |
|      └─IndexReader_37             | 100.00      | 363373   | root      |                                                                  | time:1.1s, loops:99, cop_task: {num: 100, max: 1.06s, min: 1.44ms, avg: 83.1ms, p95: 748.1ms, max_proc_keys: 1292773, p95_proc_keys: 928256, tot_proc: 8.12s, tot_wait: 21.8ms, copr_cache: disabled, build_task_duration: 27.1ms, max_distsql_concurrency: 15}, rpc_info:{Cop:{num_rpc:100, total_time:8.3s}}                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | index:HashAgg_31                                                                                                                                                                                                                                                    | 312.9 KB | N/A  |
|        └─HashAgg_31               | 100.00      | 363373   | cop[tikv] |                                                                  | tikv_task:{proc max:1.06s, min:0s, avg: 80.4ms, p80:10ms, p95:750ms, iters:9776, tasks:100}, scan_detail: {total_process_keys: 10000000, total_process_keys_size: 550000000, total_keys: 10000100, get_snapshot_time: 19.7ms, rocksdb: {key_skipped_count: 10000000, block: {cache_hit_count: 712, read_count: 19000, read_byte: 175.8 MB, read_time: 120.9ms}}}, time_detail: {total_process_time: 8.12s, total_suspend_time: 18.6ms, total_wait_time: 21.8ms, total_kv_read_wall_time: 7.06s, tikv_wall_time: 8.18s}                                                                                                                                                                                                                                                             | group by:firsthist.history.user_id, funcs:max(firsthist.history.created_at)->Column#10                                                                                                                                                                              | N/A      | N/A  |
|          └─IndexFullScan_35       | 10000000.00 | 10000000 | cop[tikv] | table:history, index:idx_created_at_user_id(created_at, user_id) | tikv_task:{proc max:910ms, min:0s, avg: 70.6ms, p80:10ms, p95:650ms, iters:9776, tasks:100}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | keep order:false                                                                                                                                                                                                                                                    | N/A      | N/A  |
|   └─IndexLookUp_19(Probe)          | 100.00      | 960      | root      |                                                                  | time:94.6ms, loops:12, index_task: {total_time: 37.4ms, fetch_handle: 37.4ms, build: 5.42µs, wait: 12.4µs}, table_task: {total_time: 22.4ms, num: 4, concurrency: 5}, next: {wait_index: 37.8ms, wait_table_lookup_build: 863.5µs, wait_table_lookup_resp: 21.5ms}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                                                                                                                                                                                                     | 32.4 KB  | N/A  |
|     ├─IndexRangeScan_17(Build)     | 100.00      | 1408     | cop[tikv] | table:h1, index:idx_created_at_user_id(created_at, user_id)      | time:37.3ms, loops:12, cop_task: {num: 9, max: 8.75ms, min: 0s, avg: 5.09ms, p95: 8.75ms, max_proc_keys: 288, p95_proc_keys: 288, tot_proc: 28.8ms, tot_wait: 270.3µs, copr_cache: disabled, build_task_duration: 268µs, max_distsql_concurrency: 3}, rpc_info:{Cop:{num_rpc:9, total_time:59ms}, rpc_errors:{context canceled:2}}, tikv_task:{proc max:10ms, min:0s, avg: 6.25ms, p80:10ms, p95:10ms, iters:23, tasks:8}, scan_detail: {total_process_keys: 1184, total_process_keys_size: 65120, total_keys: 2370, get_snapshot_time: 113.3µs, rocksdb: {key_skipped_count: 1184, block: {cache_hit_count: 8302}}}, time_detail: {total_process_time: 28.8ms, total_suspend_time: 479.1µs, total_wait_time: 270.3µs, total_kv_read_wall_time: 50ms, tikv_wall_time: 31.6ms}      | range: decided by [eq(firsthist.history.created_at, Column#9) eq(firsthist.history.user_id, firsthist.history.user_id)], keep order:false                                                                                                                           | N/A      | N/A  |
|     └─TableRowIDScan_18(Probe)     | 100.00      | 960      | cop[tikv] | table:h1                                                         | time:21.4ms, loops:8, cop_task: {num: 44, max: 5.3ms, min: 0s, avg: 899.6µs, p95: 3ms, max_proc_keys: 59, p95_proc_keys: 53, tot_proc: 12.7ms, tot_wait: 18.4ms, copr_cache: disabled, build_task_duration: 262.1µs, max_distsql_concurrency: 10, max_extra_concurrency: 1, store_batch_num: 24}, rpc_info:{Cop:{num_rpc:20, total_time:39.3ms}}, tikv_task:{proc max:10ms, min:0s, avg: 227.3µs, p80:0s, p95:0s, iters:54, tasks:44}, scan_detail: {total_process_keys: 960, total_process_keys_size: 109651, total_keys: 960, get_snapshot_time: 17.6ms, rocksdb: {block: {cache_hit_count: 1979}}}, time_detail: {total_process_time: 12.7ms, total_wait_time: 18.4ms, total_kv_read_wall_time: 10ms, tikv_wall_time: 16.5ms}                                                   | keep order:false                                                                                                                                                                                                                                                    | N/A      | N/A  |
+------------------------------------+-------------+----------+-----------+------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
10 rows in set (1.17 sec)

今回の例では、インデックスは集計でもJOINのキーとしても利用されているので、IndexHashJoinの両側で使われているのがわかります。

ただ、性能はそこまで変わりません。TiDBの場合、IndexもKey-Valueとして保存されるので、今回のようにデータ量がそんなに変わらなければ、Index Full ScanもTable Full Scanもコストはほぼ一緒です。実際のレコードはもっと大きいのでIndex Full ScanとTable full Scanの差は出るかもしれません。

TiFlash(HTAP)

さて、TiDBはTiFlashというカラムナストレージも持っています。これは特に集計演算などの列を軸とする演算に向いています。また、多くの関数を実装しておりpushdownも可能です。こちらの効果を見てみましょう。

まずはTiflashにテーブルをレプリケーションします。

mysql> alter table history set tiflash replica 1;

window関数

mysql> explain analyze SELECT * FROM (SELECT history.*,              ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS ranking       FROM history) latest WHERE ranking = 1 limit 100;
+--------------------------------------+-------------+---------+--------------+---------------+--------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| id                                   | estRows     | actRows | task         | access object | execution info                                                                       | operator info                                                                                                                                                            | memory   | disk |
+--------------------------------------+-------------+---------+--------------+---------------+--------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
| Limit_14                             | 100.00      | 100     | root         |               | time:1.17s, loops:2, RU:0.000000                                                     | offset:?, count:?                                                                                                                                                        | N/A      | N/A  |
| └─TableReader_35                     | 100.00      | 1024    | root         |               | time:1.17s, loops:1, cop_task: {num: 1, max: 0s, proc_keys: 0, copr_cache: disabled} | MppVersion: 2, data:ExchangeSender_34                                                                                                                                    | 105.3 KB | N/A  |
|   └─ExchangeSender_34                | 100.00      | 0       | mpp[tiflash] |               |                                                                                      | ExchangeType: PassThrough                                                                                                                                                | N/A      | N/A  |
|     └─Selection_17                   | 100.00      | 0       | mpp[tiflash] |               |                                                                                      | eq(Column#6, ?), stream_count: 8                                                                                                                                         | N/A      | N/A  |
|       └─Window_33                    | 100.00      | 0       | mpp[tiflash] |               |                                                                                      | row_number()->Column#6 over(partition by firsthist.history.user_id order by firsthist.history.created_at desc rows between current row and current row), stream_count: 8 | N/A      | N/A  |
|         └─Sort_23                    | 10000000.00 | 0       | mpp[tiflash] |               |                                                                                      | firsthist.history.user_id, firsthist.history.created_at:desc, stream_count: 8                                                                                            | N/A      | N/A  |
|           └─ExchangeReceiver_22      | 10000000.00 | 0       | mpp[tiflash] |               |                                                                                      | stream_count: 8                                                                                                                                                          | N/A      | N/A  |
|             └─ExchangeSender_21      | 10000000.00 | 0       | mpp[tiflash] |               |                                                                                      | ExchangeType: HashPartition, Compression: FAST, Hash Cols: [name: firsthist.history.user_id, collate: binary], stream_count: 8                                           | N/A      | N/A  |
|               └─TableFullScan_20     | 10000000.00 | 0       | mpp[tiflash] | table:history |                                                                                      | keep order:false                                                                                                                                                         | N/A      | N/A  |
+--------------------------------------+-------------+---------+--------------+---------------+--------------------------------------------------------------------------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------+------+
9 rows in set (1.25 sec)

TiKV、TiDBで行っていた処理のほとんどをTiFlashで実行しています。処理時間も20秒あまりから1.25秒と、劇的に改善しました。

集計関数

mysql> explain analyze select h1.* from history h1 join    (select  user_id, max(created_at) as latest from history group by user_id) h2  on h1.created_at = h2.latest and h1.user_id = h2.user_id limit 100;
+--------------------------------------------+-------------+----------+--------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| id                                         | estRows     | actRows  | task         | access object                                               | execution info                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | operator info                                                                                                                                                                                                                                                       | memory  | disk |
+--------------------------------------------+-------------+----------+--------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
| Limit_16                                   | 100.00      | 100      | root         |                                                             | time:158.4ms, loops:2, RU:2033.104218                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | offset:?, count:?                                                                                                                                                                                                                                                   | N/A     | N/A  |
| └─IndexHashJoin_24                         | 100.00      | 1024     | root         |                                                             | time:158.4ms, loops:1, inner:{total:362.5ms, concurrency:5, task:6, construct:9.05ms, fetch:352.2ms, build:1.62ms, join:1.15ms}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | inner join, inner:IndexLookUp_21, outer key:Column#9, firsthist.history.user_id, inner key:firsthist.history.created_at, firsthist.history.user_id, equal cond:eq(Column#9, firsthist.history.created_at), eq(firsthist.history.user_id, firsthist.history.user_id) | 2.77 MB | N/A  |
|   ├─TableReader_90(Build)                  | 100.00      | 10000    | root         |                                                             | time:88.8ms, loops:18, cop_task: {num: 9, max: 0s, min: 0s, avg: 0s, p95: 0s, copr_cache: disabled}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | MppVersion: 2, data:ExchangeSender_89                                                                                                                                                                                                                               | 19.7 KB | N/A  |
|    └─ExchangeSender_89                    | 100.00      | 10000    | mpp[tiflash] |                                                             | tiflash_task:{time:80.2ms, loops:8, threads:8}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | ExchangeType: PassThrough                                                                                                                                                                                                                                           | N/A     | N/A  |
|      └─Selection_51                       | 100.00      | 10000    | mpp[tiflash] |                                                             | tiflash_task:{time:80.2ms, loops:8, threads:8}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | not(isnull(Column#9))                                                                                                                                                                                                                                               | N/A     | N/A  |
|        └─Projection_85                    | 100.00      | 10000    | mpp[tiflash] |                                                             | tiflash_task:{time:80.2ms, loops:8, threads:8}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Column#9, firsthist.history.user_id                                                                                                                                                                                                                                 | N/A     | N/A  |
|          └─HashAgg_86                     | 100.00      | 10000    | mpp[tiflash] |                                                             | tiflash_task:{time:80.2ms, loops:8, threads:8}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | group by:firsthist.history.user_id, funcs:max(Column#18)->Column#9, funcs:firstrow(firsthist.history.user_id)->firsthist.history.user_id, stream_count: 8                                                                                                           | N/A     | N/A  |
|            └─ExchangeReceiver_88          | 100.00      | 10000    | mpp[tiflash] |                                                             | tiflash_task:{time:80.2ms, loops:8, threads:8}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | stream_count: 8                                                                                                                                                                                                                                                     | N/A     | N/A  |
|              └─ExchangeSender_87          | 100.00      | 10000    | mpp[tiflash] |                                                             | tiflash_task:{time:86ms, loops:1, threads:1}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | ExchangeType: HashPartition, Compression: FAST, Hash Cols: [name: firsthist.history.user_id, collate: binary], stream_count: 8                                                                                                                                      | N/A     | N/A  |
|                └─HashAgg_83               | 100.00      | 10000    | mpp[tiflash] |                                                             | tiflash_task:{time:86ms, loops:1, threads:1}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | group by:firsthist.history.user_id, funcs:max(firsthist.history.created_at)->Column#18                                                                                                                                                                              | N/A     | N/A  |
|                  └─TableFullScan_41       | 10000000.00 | 10000000 | mpp[tiflash] | table:history                                               | tiflash_task:{time:36ms, loops:159, threads:8}, tiflash_scan:{mvcc_input_rows:0, mvcc_input_bytes:0, mvcc_output_rows:0, lm_skip_rows:0, local_regions:13, remote_regions:0, tot_learner_read:2ms, region_balance:{instance_num: 1, max/min: 13/13=1.000000}, delta_rows:0, delta_bytes:0, segments:11, stale_read_regions:0, tot_build_snapshot:0ms, tot_build_bitmap:0ms, tot_build_inputstream:1ms, min_local_stream:9ms, max_local_stream:29ms, dtfile:{data_scanned_rows:10000000, data_skipped_rows:0, mvcc_scanned_rows:0, mvcc_skipped_rows:0, lm_filter_scanned_rows:0, lm_filter_skipped_rows:0, tot_rs_index_check:0ms, tot_read:119ms}}                                                                                                                                                                                | keep order:false                                                                                                                                                                                                                                                    | N/A     | N/A  |
|   └─IndexLookUp_21(Probe)                  | 100.00      | 1250     | root         |                                                             | time:340.3ms, loops:8, index_task: {total_time: 172.8ms, fetch_handle: 172.8ms, build: 3.65µs, wait: 16.6µs}, table_task: {total_time: 10.9ms, num: 3, concurrency: 5}, next: {wait_index: 107.9ms, wait_table_lookup_build: 930.4µs, wait_table_lookup_resp: 7.8ms}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                                                                                                                                                                                                     | 73.2 KB | N/A  |
|     ├─IndexRangeScan_19(Build)             | 100.00      | 4387     | cop[tikv]    | table:h1, index:idx_created_at_user_id(created_at, user_id) | time:172.7ms, loops:7, cop_task: {num: 17, max: 44.6ms, min: 0s, avg: 19.8ms, p95: 44.6ms, max_proc_keys: 480, p95_proc_keys: 480, tot_proc: 84.9ms, tot_wait: 5.8ms, copr_cache: disabled, build_task_duration: 9.08ms, max_distsql_concurrency: 3}, rpc_info:{Cop:{num_rpc:17, total_time:344ms}, rpc_errors:{context canceled:1}}, tikv_task:{proc max:10ms, min:0s, avg: 6.47ms, p80:10ms, p95:10ms, iters:50, tasks:17}, scan_detail: {total_process_keys: 3907, total_process_keys_size: 214885, total_keys: 7823, get_snapshot_time: 5.43ms, rocksdb: {key_skipped_count: 3907, block: {cache_hit_count: 19577, read_count: 1, read_byte: 32.0 KB, read_time: 46µs}}}, time_detail: {total_process_time: 84.9ms, total_suspend_time: 3ms, total_wait_time: 5.8ms, total_kv_read_wall_time: 100ms, tikv_wall_time: 103.9ms}  | range: decided by [eq(firsthist.history.created_at, Column#9) eq(firsthist.history.user_id, firsthist.history.user_id)], keep order:false                                                                                                                           | N/A     | N/A  |
|     └─TableRowIDScan_20(Probe)             | 100.00      | 1250     | cop[tikv]    | table:h1                                                    | time:8.96ms, loops:5, cop_task: {num: 33, max: 3.92ms, min: 0s, avg: 1.09ms, p95: 3.63ms, max_proc_keys: 112, p95_proc_keys: 104, tot_proc: 15.4ms, tot_wait: 13.4ms, copr_cache: disabled, build_task_duration: 230.5µs, max_distsql_concurrency: 11, store_batch_num: 8}, rpc_info:{Cop:{num_rpc:25, total_time:48.3ms}, rpc_errors:{context canceled:11}}, tikv_task:{proc max:0s, min:0s, avg: 0s, p80:0s, p95:0s, iters:38, tasks:22}, scan_detail: {total_process_keys: 1250, total_process_keys_size: 142970, total_keys: 1250, get_snapshot_time: 12.7ms, rocksdb: {block: {cache_hit_count: 3239, read_count: 1, read_byte: 32.0 KB, read_time: 16.1µs}}}, time_detail: {total_process_time: 15.4ms, total_suspend_time: 75.3µs, total_wait_time: 13.4ms, tikv_wall_time: 19.8ms}                                         | keep order:false                                                                                                                                                                                                                                                    | N/A     | N/A  |
+--------------------------------------------+-------------+----------+--------------+-------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+------+
14 rows in set (0.21 sec)

なんと200msになりました。集計演算はTiFlashで行い、HashJoinのテーブル側はTiKVで行うという合せ技です。TiFlashのデータ断面はTiKVと一致しているため、同じ実行計画の中で両方のエンジンを使うことも可能です。

おわりに

最新のレコードを引くというテーマをTiDBで実行してみました。今回おこなった改善案はごく一部に過ぎませんし、汎用的なものではありませんが、様々な効率化手段を試してみるきっかけになればと思います。

2
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
2
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?