More than 5 years have passed since last update.

PostgreSQLのrow-level lockの概要

Last updated at 2020-09-06Posted at 2020-09-06

この記事では、振る舞いやregular lock(table-level lock)との関係性が非常にわかりづらいPostgreSQLのrow-level lockについての解説を行います。
大まかな概要を掴むための記事なのですべての動作を網羅しているわけではないですし、一部解説に誤りが含まれていたり、正確ではない可能性があるのでご注意ください。
なお、クエリの実行例などはすべてPostgreSQL 12.3上での動作結果となります。

公式ドキュメントの記述

PostgreSQLのドキュメントではrow-level lockは次のように解説されています。

13.3.2. Row-Level Locks
In addition to table-level locks, there are row-level locks, which are listed as below with the contexts in which they are used automatically by PostgreSQL. See Table 13.3 for a complete table of row-level lock conflicts. Note that a transaction can hold conflicting locks on the same row, even in different subtransactions; but other than that, two transactions can never hold conflicting locks on the same row. Row-level locks do not affect data querying; they block only writers and lockers to the same row. Row-level locks are released at transaction end or during savepoint rollback, just like table-level locks.
Row-Level Lock Modes
FOR UPDATE
FOR UPDATE causes the rows retrieved by the SELECT statement to be locked as though for update. This prevents them from being locked, modified or deleted by other transactions until the current transaction ends. That is, other transactions that attempt UPDATE, DELETE, SELECT FOR UPDATE, SELECT FOR NO KEY UPDATE, SELECT FOR SHARE or SELECT FOR KEY SHARE of these rows will be blocked until the current transaction ends; conversely, SELECT FOR UPDATE will wait for a concurrent transaction that has run any of those commands on the same row, and will then lock and return the updated row (or no row, if the row was deleted). Within a REPEATABLE READ or SERIALIZABLE transaction, however, an error will be thrown if a row to be locked has changed since the transaction started. For further discussion see Section 13.4.
The FOR UPDATE lock mode is also acquired by any DELETE on a row, and also by an UPDATE that modifies the values on certain columns. Currently, the set of columns considered for the UPDATE case are those that have a unique index on them that can be used in a foreign key (so partial indexes and expressional indexes are not considered), but this may change in the future.
FOR NO KEY UPDATE
Behaves similarly to FOR UPDATE, except that the lock acquired is weaker: this lock will not block SELECT FOR KEY SHARE commands that attempt to acquire a lock on the same rows. This lock mode is also acquired by any UPDATE that does not acquire a FOR UPDATE lock.
FOR SHARE
Behaves similarly to FOR NO KEY UPDATE, except that it acquires a shared lock rather than exclusive lock on each retrieved row. A shared lock blocks other transactions from performing UPDATE, DELETE, SELECT FOR UPDATE or SELECT FOR NO KEY UPDATE on these rows, but it does not prevent them from performing SELECT FOR SHARE or SELECT FOR KEY SHARE.
FOR KEY SHARE
Behaves similarly to FOR SHARE, except that the lock is weaker: SELECT FOR UPDATE is blocked, but not SELECT FOR NO KEY UPDATE. A key-shared lock blocks other transactions from performing DELETE or any UPDATE that changes the key values, but not other UPDATE, and neither does it prevent SELECT FOR NO KEY UPDATE, SELECT FOR SHARE, or SELECT FOR KEY SHARE.
PostgreSQL doesn't remember any information about modified rows in memory, so there is no limit on the number of rows locked at one time. However, locking a row might cause a disk write, e.g., SELECT FOR UPDATE modifies selected rows to mark them locked, and so will result in disk writes.

確認する機会が多いであろう、pg_locksで確認可能なtable-level lockとは全く異なる異なる体系を持っていることがわかります。

また、ドキュメントの先頭には次の記述があります。

To examine a list of the currently outstanding locks in a database server, use the pg_locks system view. For more information on monitoring the status of the lock manager subsystem, refer to Chapter 27.

しかしながら、実際に行ロックの競合が発生しているはずの状況でpg_locksを確認しても前述の行ロックの情報を確認することはできないです。

session1

postgres=# SELECT * FROM hoge FOR UPDATE;
 id | name
----+------
  1 |
  2 |
(2 rows)

session2

BEGIN
postgres=# SELECT * FROM hoge FOR UPDATE;

session3

postgres=# SELECT locktype, database, relation, page, tuple, pid, mode, granted  FROM pg_locks ORDER BY granted;
   locktype    | database | relation | page | tuple |  pid  |        mode         | granted
---------------+----------+----------+------+-------+-------+---------------------+---------
 transactionid |          |          |      |       | 23596 | ShareLock           | f
 relation      |    12677 |    32851 |      |       | 23462 | RowShareLock        | t
 virtualxid    |          |          |      |       | 23462 | ExclusiveLock       | t
 relation      |    12677 |    12143 |      |       | 23621 | AccessShareLock     | t
 virtualxid    |          |          |      |       | 23621 | ExclusiveLock       | t
 relation      |    12677 |    32858 |      |       | 23596 | RowShareLock        | t
 relation      |    12677 |    32851 |      |       | 23596 | RowShareLock        | t
 virtualxid    |          |          |      |       | 23596 | ExclusiveLock       | t
 transactionid |          |          |      |       | 23462 | ExclusiveLock       | t
 relation      |    12677 |    32858 |      |       | 23462 | RowShareLock        | t
 tuple         |    12677 |    32851 |    0 |     1 | 23596 | AccessExclusiveLock | t
(11 rows)

pg_locksにはtable-level lockしか存在せず、row-level lockが存在しないことがわかります。

row-level lockの確認方法

公式ドキュメントには記述はないですが、PostgreSQLではrow-level lockを確認するための拡張機能であるpgrowlocksが提供されています。

The pgrowlocks module provides a function to show row locking information for a specified table.

By default use is restricted to superusers, members of the pg_stat_scan_tables role, and users with SELECT permissions on the table.

F.28.1. Overview
pgrowlocks(text) returns setof record
The parameter is the name of a table. The result is a set of records, with one row for each locked row within the table.

前述の関数の定義より、「テーブル毎」に取得する情報であることがわかります。
実施に前述の例でpgrowlocks関数を利用してrow-level lockを取得した例は以下の通りです。
ctidが(0,1)および(0,2)の行にて、pid 23462によってFor Updateのロックが取得されていることがわかります。

postgres=# SELECT * FROM pgrowlocks('hoge');
 locked_row | locker | multi |   xids   |     modes      |  pids
------------+--------+-------+----------+----------------+---------
 (0,1)      | 309032 | f     | {309032} | {"For Update"} | {23462}
 (0,2)      | 309032 | f     | {309032} | {"For Update"} | {23462}
(2 rows)

以上の検証結果から、PostgreSQLではTable-level lockとrow-level lockはそれぞれ独立した世界となっていることがわかります。

table-level lockとrow-level lockの比較

次に本題であるTable-level lockとrow-level lockの関係性について説明する前段として、両者の定義をもう少し実装よりで比較してみます。
table-level lock(regular lock), row-level lockの詳細については、以下のPostgreSQLのREADMEにて非常に詳しく説明されています。

Regular locks (a/k/a heavyweight locks). The regular lock manager
supports a variety of lock modes with table-driven semantics, and it has
full deadlock detection and automatic release at transaction end.
Regular locks should be used for all user-driven lock requests.

Locking tuples

Locking tuples is not as easy as locking tables or other database objects.
The problem is that transactions might want to lock large numbers of tuples at
any one time, so it's not possible to keep the locks objects in shared memory.
To work around this limitation, we use a two-level mechanism. The first level
is implemented by storing locking information in the tuple header: a tuple is
marked as locked by setting the current transaction's XID as its XMAX, and
setting additional infomask bits to distinguish this case from the more normal
case of having deleted the tuple. When multiple transactions concurrently
lock a tuple, a MultiXact is used; see below. This mechanism can accommodate
arbitrarily large numbers of tuples being locked simultaneously.

それぞれのREADMEの記述およびファイルが配置されているパスから、両者は目的としているオブジェクトやレイヤーが大きく異なることがわかります。

Table-level lock
- テーブル、インデックス等のSQLオブジェクト(≒table-driven semantics)で利用される
- full deadlock detection, automatic release at transaction endの機能が提供される
row-level lock
- heapテーブルで利用される、個別のaccess methodのレイヤーで動作する
- 行レベルのロックの競合の解消に利用される
- lockerが一つの場合には行に直接情報を埋め込み、lockerが複数の場合にはMultiXactというメカニズムを用いて、メモリを枯渇させることなく大量のロック情報を扱うことが可能

もう少し動作の具体的なイメージを掴むために、実際に更新処理で競合が発生する例をコードを追いながら見てみます。

二つのUPDATE文で行ロックの競合が発生する例

以下の二つのUPDATE文で行ロックの競合が発生する事例をコードレベルで見てみます。

session1

postgres=# SELECT pg_backend_pid();
 pg_backend_pid
----------------
             35
(1 row)
postgres=# BEGIN;
BEGIN
postgres=# UPDATE hoge SET name = 'test';
UPDATE 2
postgres=# SELECT txid_current_snapshot();
 txid_current_snapshot
-----------------------
 309037:309037:
(1 row)

postgres=# SELECT txid_current();
 txid_current
--------------
       309037
(1 row)

この時点で、各行は次のように行ロックが取得されています。

session2

postgres=# SELECT pgrowlocks('hoge');
                       pgrowlocks
--------------------------------------------------------
 ("(0,1)",309037,f,{309037},"{""No Key Update""}",{35})
 ("(0,2)",309037,f,{309037},"{""No Key Update""}",{35})
(2 rows)

また、pg_locksの状態は次の通りです。
session1(pid=35)は先勝ちで行ロックを取得できているので、特にregular lockの世界では追加でロックを取得する必要がないことがわかります。
一方で、session2(pid=56)に関しては、relation hogeのctid(0, 1)に対してtupleのExclusiveLockを取得しており、かつsession1のtransactionidに対してSharedLockの取得を試みており、granted=false(ロックの開放待ち)であることがわかります。

postgres=# SELECT locktype, database, relation,transactionid, page, tuple, pid, mode, granted  FROM pg_locks ORDER BY pid, granted;
   locktype    | database | relation | transactionid | page | tuple | pid |       mode       | granted
---------------+----------+----------+---------------+------+-------+-----+------------------+---------
 relation      |    12677 |    32858 |               |      |       |  35 | RowExclusiveLock | t
 transactionid |          |          |        309037 |      |       |  35 | ExclusiveLock    | t
 virtualxid    |          |          |               |      |       |  35 | ExclusiveLock    | t
 relation      |    12677 |    32851 |               |      |       |  35 | RowExclusiveLock | t
 transactionid |          |          |        309037 |      |       |  56 | ShareLock        | f
 transactionid |          |          |        309038 |      |       |  56 | ExclusiveLock    | t
 relation      |    12677 |    32851 |               |      |       |  56 | RowExclusiveLock | t
 relation      |    12677 |    32858 |               |      |       |  56 | RowExclusiveLock | t
 virtualxid    |          |          |               |      |       |  56 | ExclusiveLock    | t
 tuple         |    12677 |    32851 |               |    0 |     1 |  56 | ExclusiveLock    | t
 relation      |    12677 |    12143 |               |      |       | 447 | AccessShareLock  | t
 virtualxid    |          |          |               |      |       | 447 | ExclusiveLock    | t
(12 rows)

ドキュメントの仕様通り、No Key Updateの行ロックがpid 35のセッションから取得されていることがわかります。
この状態で次のように別セッションからもUPDATE文を実行します。
No Key Update同士はコンフリクトが発生するので、先にロックしているセッションのトランザクションが完了するまで待たないと先に進めません。

postgres=# BEGIN;
BEGIN
postgres=# UPDATE hoge SET name = 'test';

この時のsession2のstacktraceは次の通りです。

# 0  0x00007fcc92e2eb77 in epoll_wait (epfd=19, events=0x557c88f19cc8, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30
# 1  0x0000557c86e2bf4a in WaitEventSetWaitBlock (set=0x557c88f19c50, cur_timeout=-1, occurred_events=0x7fff09d94610, nevents=1) at latch.c:1080
# 2  0x0000557c86e2be0f in WaitEventSetWait (set=0x557c88f19c50, timeout=-1, occurred_events=0x7fff09d94610, nevents=1, wait_event_info=50331652)
    at latch.c:1032
# 3  0x0000557c86e2b679 in WaitLatchOrSocket (latch=0x7fcc925c05c4, wakeEvents=33, sock=-1, timeout=-1, wait_event_info=50331652) at latch.c:407
# 4  0x0000557c86e2b535 in WaitLatch (latch=0x7fcc925c05c4, wakeEvents=33, timeout=0, wait_event_info=50331652) at latch.c:347
# 5  0x0000557c86e41e66 in ProcSleep (locallock=0x557c88e77448, lockMethodTable=0x557c87491260 <default_lockmethod>) at proc.c:1289
# 6  0x0000557c86e3bbf5 in WaitOnLock (locallock=0x557c88e77448, owner=0x557c88e8b340) at lock.c:1768
# 7  0x0000557c86e3ac37 in LockAcquireExtended (locktag=0x7fff09d94a30, lockmode=5, sessionLock=false, dontWait=false, reportMemoryError=true, locallockp=0x0)
    at lock.c:1050
# 8  0x0000557c86e3a24c in LockAcquire (locktag=0x7fff09d94a30, lockmode=5, sessionLock=false, dontWait=false) at lock.c:713
# 9  0x0000557c86e38e10 in XactLockTableWait (xid=309037, rel=0x7fcc89a18140, ctid=0x7fff09d94b94, oper=XLTW_Update) at lmgr.c:658
# 10 0x0000557c86a670eb in heap_update (relation=0x7fcc89a18140, otid=0x7fff09d94dda, newtup=0x557c88f19aa0, cid=0, crosscheck=0x0, wait=true,
    tmfd=0x7fff09d94d80, lockmode=0x7fff09d94d24) at heapam.c:3228
# 11 0x0000557c86a72322 in heapam_tuple_update (relation=0x7fcc89a18140, otid=0x7fff09d94dda, slot=0x557c88f193e0, cid=0, snapshot=0x557c88f09ec0,
    crosscheck=0x0, wait=true, tmfd=0x7fff09d94d80, lockmode=0x7fff09d94d24, update_indexes=0x7fff09d94d21) at heapam_handler.c:332
# 12 0x0000557c86c9df01 in table_tuple_update (rel=0x7fcc89a18140, otid=0x7fff09d94dda, slot=0x557c88f193e0, cid=0, snapshot=0x557c88f09ec0, crosscheck=0x0,
    wait=true, tmfd=0x7fff09d94d80, lockmode=0x7fff09d94d24, update_indexes=0x7fff09d94d21) at ../../../src/include/access/tableam.h:1275
# 13 0x0000557c86c9fe84 in ExecUpdate (mtstate=0x557c88f18420, tupleid=0x7fff09d94dda, oldtuple=0x0, slot=0x557c88f193e0, planSlot=0x557c88f19280,
    epqstate=0x557c88f18518, estate=0x557c88f180a0, canSetTag=true) at nodeModifyTable.c:1318
# 14 0x0000557c86ca1410 in ExecModifyTable (pstate=0x557c88f18420) at nodeModifyTable.c:2229
# 15 0x0000557c86c740fd in ExecProcNodeFirst (node=0x557c88f18420) at execProcnode.c:445
# 16 0x0000557c86c69b18 in ExecProcNode (node=0x557c88f18420) at ../../../src/include/executor/executor.h:239
# 17 0x0000557c86c6bfbe in ExecutePlan (estate=0x557c88f180a0, planstate=0x557c88f18420, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false,
    numberTuples=0, direction=ForwardScanDirection, dest=0x557c88e7fb38, execute_once=true) at execMain.c:1646
# 18 0x0000557c86c6a02c in standard_ExecutorRun (queryDesc=0x557c88f17000, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:364
# 19 0x0000557c86c69ed1 in ExecutorRun (queryDesc=0x557c88f17000, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:308
# 20 0x0000557c86e5bc49 in ProcessQuery (plan=0x557c88e7fa58, sourceText=0x557c88e589f0 "UPDATE hoge SET name = 'test';", params=0x0, queryEnv=0x0,
    dest=0x557c88e7fb38, completionTag=0x7fff09d95250 "") at pquery.c:161
# 21 0x0000557c86e5d52a in PortalRunMulti (portal=0x557c88ebff10, isTopLevel=true, setHoldSnapshot=false, dest=0x557c88e7fb38, altdest=0x557c88e7fb38,
    completionTag=0x7fff09d95250 "") at pquery.c:1283
# 22 0x0000557c86e5cae1 in PortalRun (portal=0x557c88ebff10, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x557c88e7fb38,
    altdest=0x557c88e7fb38, completionTag=0x7fff09d95250 "") at pquery.c:796
# 23 0x0000557c86e5670c in exec_simple_query (query_string=0x557c88e589f0 "UPDATE hoge SET name = 'test';") at postgres.c:1215
# 24 0x0000557c86e5ac77 in PostgresMain (argc=1, argv=0x557c88e84050, dbname=0x557c88e83f18 "postgres", username=0x557c88e83ef8 "postgres") at postgres.c:4247
# 25 0x0000557c86db788c in BackendRun (port=0x557c88e7cbb0) at postmaster.c:4448
# 26 0x0000557c86db6f90 in BackendStartup (port=0x557c88e7cbb0) at postmaster.c:4139
# 27 0x0000557c86db32c5 in ServerLoop () at postmaster.c:1704
# 28 0x0000557c86db2a74 in PostmasterMain (argc=3, argv=0x557c88e525e0) at postmaster.c:1377
# 29 0x0000557c86cd5799 in main (argc=3, argv=0x557c88e525e0) at main.c:228

以下はheap_update関数の引用です。
更新対象tuple状態をHeapTupleSatisfiesUpdate関数で調べてTM_BeingModified(他のセッションで更新中、まだcommit/rollbackされていない)場合には、いったん対象バッファ(page)をunlockしたうえで、regular lockのレイヤーでのTupleに対するロックを取得(heap_acquire_tuplock)してから、すでに当該tupleを更新しているトランザクションの完了待ち(XactLockTableWait)を行っていることがわかります。

/*
 *	heap_update - replace a tuple
 *
 * See table_tuple_update() for an explanation of the parameters, except that
 * this routine directly takes a tuple rather than a slot.
 *
 * In the failure cases, the routine fills *tmfd with the tuple's t_ctid,
 * t_xmax (resolving a possible MultiXact, if necessary), and t_cmax (the last
 * only for TM_SelfModified, since we cannot obtain cmax from a combocid
 * generated by another transaction).
 */
TM_Result
heap_update(Relation relation, ItemPointer otid, HeapTuple newtup,
			CommandId cid, Snapshot crosscheck, bool wait,
			TM_FailureData *tmfd, LockTupleMode *lockmode)
...
	result = HeapTupleSatisfiesUpdate(&oldtup, cid, buffer);
...
	else if (result == TM_BeingModified && wait)
	{
		TransactionId xwait;
		uint16		infomask;
		bool		can_continue = false;
...
		else
		{
			/*
			 * Wait for regular transaction to end; but first, acquire tuple
			 * lock.
			 */
			LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
			heap_acquire_tuplock(relation, &(oldtup.t_self), *lockmode,
								 LockWaitBlock, &have_tuple_lock);
			XactLockTableWait(xwait, relation, &oldtup.t_self,
							  XLTW_Update);
			checked_lockers = true;
			LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);

各関数と前述のpg_locksの行との対比は次の通りです。

heap_acquire_tuplock関数で取得するロック

   locktype    | database | relation | transactionid | page | tuple | pid |       mode       | granted
---------------+----------+----------+---------------+------+-------+-----+------------------+---------
 tuple         |    12677 |    32851 |               |    0 |     1 |  56 | ExclusiveLock    | t

XactLockTableWait関数で取得するロック

   locktype    | database | relation | transactionid | page | tuple | pid |       mode       | granted
---------------+----------+----------+---------------+------+-------+-----+------------------+---------
 transactionid |          |          |        309037 |      |       |  56 | ShareLock        | f

ここでregular lock(table-level lock)の役割について振り返ってみます。

Table-level lock
- テーブル、インデックス等のSQLオブジェクト(≒table-driven semantics)で利用される
- full deadlock detection, automatic release at transaction endの機能が提供される

要するに、行ロックの競合の判定についてはheap tableのレイヤーで閉じていて、そのレイヤー内でハンドリングが行われます。
しかしながら、一旦行ロックの競合が発生した場合には、heap tableのレイヤーから必要に応じてregular lockを呼び出して、regular lockのレイヤー内でのTupleに対するロックの登録および他のトランザクションの実行完了待ちを行うことになります。
※ややこしいですが、もちろん、regular lockのレイヤー内部でのTupleロック同士の競合やデッドロックも発生する可能性があります

PostgreSQLではこのようにして、row-level lockとtable-level lock(regular lock)の世界が共存しているわけです。

まとめ

以下はこの記事のまとめです。

PostgreSQLではrow-level lockとtable-level lock(regular lock)が存在し、共存しています
table-level lockはpg_locks、row-level lockはpgrowlocksで確認する必要があります
実際のトランザクション処理にてheap tableのレイヤーで行ロックの競合を検知した場合には、table-level lockのレイヤーでTupleに対するロックの登録および他のトランザクションの実行完了待ちを行うことになります
- どちらの存在も意識する必要があります

散々悩んだ末に薄っぺらい内容になってしまいましたが、次回はより実装に踏み込んだ記事を作成予定です。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up