More than 5 years have passed since last update.

ツリー構造のデータをRDBで扱う

Last updated at 2018-12-10Posted at 2018-12-10

Page 1 of 17

例

カテゴリー

考慮する事項

ツリー全体の参照（ソート）
ツリーの一部の参照（ソート）
挿入・更新
削除

データをどのように定義するか考える

古くから推奨されているのは、親のIDを定義するパターン。
隣接リストと呼ばれる。

テーブル定義

create table category (
  category_id int(11) not null auto_increment,
  name varchar(40) not null,
  parent_id int(11),
  primary key (category_id),
  key parent_id_idx (parent_id)
    FOREIGN KEY (parent_id) 
    REFERENCES parent(category_id)
)

参照の問題

階層が深くなると、一度のクエリで取得するのが難しくなる。

select c1.*, c2.*, c3.*
from category c1 left outer join category c2
  on c1.category_id = c2.parent_id
left outer join category c3
  on c2.category_id = c3.parent_id
where c1.category_id = 2
;

アプリで処理する場合も、データが増えるほどクエリの発行回数が増えてしまう。

select category_id from category where name = 'アパレル'; -- 5
select category_id from category where parent_id = 5; -- 7, 8
select category_id from category where parent_id = 7; -- なし
select category_id from category where parent_id = 8; -- なし

挿入・更新・削除

挿入・更新は容易だが、削除は末端ノード以外だと面倒になる。

insert into category values (9, 'トートバッグ', 6);
update category set parent_id = 2 where cateogory_id = 5;

-- ノードの一部分を子ノードのも含めて削除
select category_id from category where name = 'アパレル'; -- 5
select category_id from category where parent_id = 5; -- 7, 8
select category_id from category where parent_id = 7; -- なし
select category_id from category where parent_id = 8; -- なし
-- 参照整合性制約のため、子のデータから順番に削除する必要がある
delete from cateogry where category_id in (7, 8);
delete from cateogry where category_id in (5);

ただし、 on delete cascade 修飾子で自動削除も可能。

末端でないノードを削除。ただし、子ノードは残す

select parent_id from category where name = 'アパレル'; -- 5
select category_id from category where parent_id = 5; -- 7, 8
update category set parent_id = 2 where cateogory_id in (7, 8);
delete from category where category_id = 5;

隣接リストの特徴

階層が増えると一度のクエリで取得が困難になる
削除の時など、やりたいことに対して、クエリやアプリケーション側のコードが複雑になりやすい

隣接リストを用いたツリー構造は、ナイーブツリーと呼ばれるアンチパターン。（ただしDBの実装によって例外あり）

解決策

代わりのツリーモデルを使用する。
ただし、どの構造も隣接リストに比べると複雑なので、事前に比較検討すること。

経路列挙

隣接リストの親ノードの取得の問題を解決するデータモデル。
親ノードの経路（ディレクトリパス /usr/local/path/ のようなもの）を表す文字列をデータとして含める。

テーブル定義

create table category (
  category_id int(11) not null auto_increment,
  name varchar(40) not null,
  path varchar(1000),
  primary key (category_id),
  key parent_id_idx (parent_id)
)

参照


-- サブツリー全体を取得するときのクエリ
select * from category where path like '1/3/%'

-- 親のノードを取得するときのクエリ
select * from category where '1/3/5/8' like concat(path, '%')

挿入・更新・削除

-- 挿入は親のパスをコピーし、新規のノードに付与する
insert into category values (9, 'トートバッグ', '1/3/');

-- 親ノードの更新は、一度に取得せずに、先に対象を取得してからパスをアプリケーション側で作成してから更新する。
-- 一時的に整合性が取れない状態が発生してしまう。
select * from category where name = 'アパレル'; -- 5
-- パスの値はアプリ側で生成
select category_id from category where path like concat('1/3/5', '/', '%')); -- 7, 8
update category set path = '1/2/' where category_id = 5;
update category set path = '1/2/5' where category_id in (7, 8);

-- 子ノード全体の削除は参照時の検索条件がそのまま流用できる
delete from category where path like '1/3/%';

経路列挙のまとめ

子やツリーに対しての参照は容易
挿入・削除も難しくはないが、参照整合性を維持することができない

閉包テーブル

直接の親子関係だけではなく、ツリー全体のパスを別テーブルに格納する方式。

create table category (
  category_id int(11) not null auto_increment,
  name varchar(40) not null,
  primary key (category_id),
  key parent_id_idx (parent_id)
);

create table category_tree(
  category_tree_id int(11) not null auto_increment,
  ancestor_category_id int(11) not null,
  descendent_category_id int(11) not null,
  primary key tree_id,
  key ancestor_category_id_idx (ancestor_category_id),
    FOREIGN KEY (ancestor_category_id) 
    REFERENCES category(category_id),
  key descendent_category_id_idx (descendent_category_id),
    FOREIGN KEY (descendent_category_id) 
    REFERENCES category(category_id)
);

閉包テーブルでは、 category 自身にツリー構造を格納するのではなく、 category_tree にツリー情報を格納する。
また、treeには自分自身を参照する行も追加する。

参照

-- 子のツリー全体を取得するSQL
select c.* from category c inner join category_tree ct on c.category_id = ct.descendent_category_id
where ct.ancestor_category_id = 3;

-- 子のツリー全体を取得するSQL
select c.* from category c inner join category_tree ct on c.category_id = ct.ancestor_category_id
where ct.descendent_category_id = 3;

挿入

-- バッグ（category_id = 6）の子にトートバッグを挿入。
-- categoryにトートバッグを追加後、バッグの全ての親ノードにトートバッグとのひも付けと自己参照する行をcategory_treeに挿入
insert into category values (9, 'トートバッグ');
insert into category_tree (ancestor_category_id, descendent_category_id)
values select ct.ancestor_category_id, 9 -- トートバッグのid
  from category_tree ct where ct.category_id = 6 -- バッグのid
union all select 9, 9;

削除

-- 末端のノードを削除
-- 子孫のcategory_idに削除対象のcategory_idを指定して削除し、その後category自身を削除する
delete from category_tree where descendent_category_id = 4;
delete from category where category_id = 4; -- テーブルに`ON DELETE CASCADE` 指定をしておくとこの操作も不要になる

-- サブツリー全体の削除
-- 削除対象のカテゴリを子孫に持つカテゴリと、削除対象のカテゴリの子孫を子孫として参照する全ての行を削除
delete from category_tree
where descendent_category_id (
  select x.id from (
    select descendent_category_id id from category_tree where ancestor_category_id = 5) x);

閉包テーブルのまとめ

子やツリーに対しての参照は容易で、インデックスも効かせやすい
挿入・削除も難しくないし、参照整合性を保つこともできる
そもそもの構造が複雑であるのと、データの容量が増えやすい

まとめ

ツリー構造のデータを作る時、
- 親のIDを持つだけの隣接リストはアンチパターンとなることがある。ただしDBの実装によっては問題なくなることもある。
- 経路列挙は参照や挿入・削除も容易だが、参照整合性が維持できない
- 閉包テーブルは参照や挿入・削除も容易だが、参照整合性が維持できるが、構造が複雑でデータが増えやすい

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up