はじめに
「SQL 第2版 ゼロからはじめるデータベース操作」でMySQLを学習をしていたところ、HAVING句よりWHERE句の方が実行速度が早いという記述があったため、実際に実行時間を測定してみた。
実験環境
Windows 11 Home
Ubuntu 22.04.3
MySQL 8.0.35
実験準備
DB作成
料理に関するデータベースをサンプルとし実験を行う。
まずは、データベースを作成する。
CREATE DATABASE foods;
USE foods;
テーブル作成
次に、以下テーブルを作成する。
- food_genres:料理のジャンルを管理
- foods :料理に関する情報を管理
CREATE TABLE food_genres
(
food_genre_id INT AUTO_INCREMENT PRIMARY KEY,
food_genre VARCHAR(100) NOT NULL
);
CREATE TABLE foods
(
food_id INT AUTO_INCREMENT PRIMARY KEY,
food_name VARCHAR(100) NOT NULL,
food_genre_id INT NOT NULL,
FOREIGN KEY (food_genre_id) REFERENCES food_genres(food_genre_id)
);
サンプルデータ挿入
- food_genres:3種類
- foods :14,348,907件の料理データ
-- food_genresテーブル
INSERT INTO food_genres (food_genre) VALUES
('和食'),
('イタリアン'),
('中華料理');
-- foodsテーブル
INSERT INTO foods (food_name, food_genre_id)
SELECT
CONCAT('Food', LPAD(FLOOR(RAND() * 1000000), 7, '0')),
FLOOR(RAND() * 3) + 1
FROM
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n1,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n2,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n3,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n4,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n5,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n6,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n7,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n8,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n9,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n10,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n11,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n12,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n13,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n14,
(SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3) AS n15;
サンプルデータ確認
-- food_genresテーブル
mysql> SELECT * FROM food_genres;
+---------------+-----------------+
| food_genre_id | food_genre |
+---------------+-----------------+
| 1 | 和食 |
| 2 | イタリアン |
| 3 | 中華料理 |
+---------------+-----------------+
-- foodsテーブル
mysql> SELECT COUNT(*) FROM foods;
+----------+
| count(*) |
+----------+
| 14348907 |
+----------+
mysql> SELECT * FROM foods WHERE food_id BETWEEN 1 AND 10;
+---------+-------------+---------------+
| food_id | food_name | food_genre_id |
+---------+-------------+---------------+
| 1 | Food0865599 | 2 |
| 2 | Food0737612 | 1 |
| 3 | Food0224238 | 1 |
| 4 | Food0622018 | 1 |
| 5 | Food0800200 | 1 |
| 6 | Food0605639 | 1 |
| 7 | Food0305349 | 2 |
| 8 | Food0350885 | 2 |
| 9 | Food0918687 | 2 |
| 10 | Food0297095 | 1 |
+---------+-------------+---------------+
実験内容
同じ実行結果を出力する以下の2つのSELECT文に対して、EXPLAIN ANALYZEコマンドを実行し、実行時間の測定及び比較を実施する。
-- WHERE句
SELECT food_genre_id, COUNT(*) FROM foods WHERE food_genre_id = 1 GROUP BY food_genre_id;
-- HAVING句
SELECT food_genre_id, COUNT(*) FROM foods GROUP BY food_genre_id HAVING food_genre_id = 1 ;
実験結果
WHERE句
mysql> EXPLAIN ANALYZE SELECT food_genre_id, COUNT(*) FROM foods WHERE food_genre_id = 1 GROUP BY food_genre_id;
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Group aggregate: count(0) (cost=1.4e+6 rows=6.98e+6) (actual time=1070..1070 rows=1 loops=1)
-> Covering index lookup on foods using food_genre_id (food_genre_id=1) (cost=704152 rows=6.98e+6) (actual time=0.505..913 rows=4.79e+6 loops=1)
|
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (1.08 sec)
HAVING句
mysql> EXPLAIN ANALYZE SELECT food_genre_id, COUNT(*) FROM foods GROUP BY food_genre_id HAVING food_genre_id = 1 ;
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| EXPLAIN |
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| -> Filter: (foods.food_genre_id = 1) (cost=2.82e+6 rows=14e+6) (actual time=997..3164 rows=1 loops=1)
-> Group aggregate: count(0) (cost=2.82e+6 rows=14e+6) (actual time=997..3164 rows=3 loops=1)
-> Covering index scan on foods using food_genre_id (cost=1.43e+6 rows=14e+6) (actual time=0.204..2645 rows=14.3e+6 loops=1)
|
+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (3.16 sec)
実行時間の比較
コマンド名 | 実行時間(sec) |
---|---|
WHERE | 1.08 |
HAVING | 3.16 |
まとめ
HAVING句よりWHERE句を使用して事前にレコードを絞り込をした方が早いということを身をもって知ることが出来た。データ数がより大量になるほど、結果が顕著に現れるため注意したい。
参考文献
この記事は以下の情報を参考にして執筆しました。