More than 3 years have passed since last update.

LMDB With Dupsort

Last updated at 2022-03-02Posted at 2022-03-02

背景

データの読み書き速度を求める場合、Key-ValueデータベースのLMDB　(Lightning Memory-Mapped Database) を使うことはよくあります。Key-Valueデータベースは、データをKeyとValueのペアのコレクションとして格納します。Keyは識別子として、単一のKeyに対して複数のデータアイテムをサポートできます。

キーの重複

Key-ValueデータベースのKeyがユニークな存在だと思っている方が多いかもしれませんが、実は、LMDBはDupsortモードにて重複キーもサポートしています。

Dupsort = Flase

Key	Value
K1	Data1,Data2,Data3
K2	Data4,Data5,Data6

Dupsort = True

Key	Value
K1	Data1
K1	Data2
K1	Data3
K2	Data4
K2	Data5
K2	Data6

パフォーマンス

大きいデータセットを１つのキーに纏めて保存すると、サイズが大きければ大きいほど、読み書きの時間曲線が宇宙に飛んでいきます。
この場合、Dupsortを使えば、かなり改善できます。

関連API

open_db(key=None, txn=None, reverse_key=False, dupsort=False, create=True, integerkey=False, integerdup=False, dupfixed=False)
dupsort:
 Duplicate keys may be used in the database. (Or, from another perspective, keys may have multiple data items, stored in sorted order.) By default keys must be unique and may have only a single data item.

getmulti(keys, dupdata=False, dupfixed_bytes=None, keyfixed=False)
dupdata:
 If True and database was opened with dupsort=True, read all duplicate values for each matching key.
dupfixed_bytes:
 If database was opened with dupsort=True and dupfixed=True, accepts the size of each value, in bytes, and applies an optimization reducing the number of database lookups.

putmulti(items, dupdata=True, overwrite=True, append=False)
dupdata:
 If True and database was opened with dupsort=True, add pair as a duplicate if the given key already exists. Otherwise overwrite any existing matching key.

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up