ð ã¯ããã«
ããã«ã¡ã¯ã
ç§ãéçºãšãµãŒããŒéçšãæ
åœããŠãããXD.GROWTHãã¯PostgreSQLã®åæ£æ¡åŒµã§ãã Citus ãå©çšããŠããŒã¿ããŒã¹ãæ°Žå¹³åæ£ããŠããŸãããã®èšäºã§ã¯ãCitusç°å¢ã®ãããåŠçã§ Distributed DeadlockïŒåæ£ãããããã¯ïŒãé »çºããé㮠調æ»éçšãšæçµçãªå¯Ÿç ã«ã€ããŠãå®éçšã§åŸãããç¥èŠãå
±æããŸãã
ERROR: canceling the transaction since it was involved in a distributed deadlock
ãåãã¬ã³ãŒããæŽæ°ããŠããããã§ããªãã®ã«ããªããããããã¯ãèµ·ãããã âŠâŠïŒããšé·ããã¢ã€ã¢ã€ããŠããã®ã§ãããåå ãè °ãæ®ããŠè¿œããããŠã¿ãããCitusã®ã¢ãŒããã¯ãã£ã«æ ¹å·®ããåãããããæ§é ã§ããã
察象èªè ã¯ä»¥äžãæ³å®ããŠããŸãã
- PostgreSQL ã®éçšçµéšãããæ¹
- Citusæ¡åŒµãéçšããŠããããŸãã¯å°å ¥ãæ€èšããŠããæ¹
- 忣ãããããã¯ã«æ©ãŸãããŠããæ¹
æ¬èšäºã®ã¹ã³ãŒã
æ¬èšäºã¯ Citusç°å¢ã§èµ·ããDistributed Deadlock ã®çºçã¡ã«ããºã ãšãã¢ããªã±ãŒã·ã§ã³ããžãã¯ã§ã®åé¿ç ã«çŠç¹ãåœãŠãŠããŸããCitusã®åºæ¬çãªä»çµã¿ãã»ããã¢ããã¯æ±ããŸãããåºæ¬ã«ã€ããŠã¯Cituså
¬åŒããã¥ã¡ã³ã ãåç
§ããŠãã ããã
âïž åäœç°å¢ïŒåèïŒ
| é ç® | ããŒãžã§ã³ |
|---|---|
| PostgreSQL | 17.5 |
| Citus | 13.1.0 |
ðŠ èµ·ããŠããããš
ãã£ãã以äžã®ãããªç¶æ³ã§ãã
-
ããŒãã«æ§æ:
-
profilesããŒãã«: ãŠãŒã¶ãŒã®ãããã¡ã€ã«æ å ±ïŒPK:tenant_id,profile_idïŒ -
profile_relationsããŒãã«: ãããã¡ã€ã«ã«çŽã¥ãã»ãã·ã§ã³æ å ±ïŒPK:tenant_id,profile_id,session_idïŒ
-
-
忣ããŒ: äž¡ããŒãã«ãšã
profile_idã§åæ£ïŒco-locatedïŒ - çç¶: ç°ãªãããã³ãã®ãããåŠçã䞊åã«å®è¡ãããšãDistributed Deadlock ãçºç
- èŠæš¡æ: 1ãããã§10äžã200äžä»¶çšåºŠã®æŽæ°
ãããã³ãIDãéãããæ±ã£ãŠããããŒã¿ãéãã¯ããªã®ã«ãäœã§ã¶ã€ãããã ïŒããšããã®ãæåã®çåã§ããã
ð§ ä»®èª¬ãšæ€èšŒïŒçŽäœæ²æïŒ
æåã¯ãšã£æ£ããã£ã仮説ãããã€ãç«ãŠãŠã¯æœ°ãããšããæéãéãããŸããã
â 仮説1: Workerã§ã·ã£ãŒãåäœã®ããŒãã«ããã¯ãåãããŠããã®ã§ã¯ïŒ
ãããã³ããéã£ãŠãããã·ã¥è¡çªã§åãã·ã£ãŒãã«åå± ããŠãã·ã£ãŒãã¬ãã«ã®ããã¯ã§ç«¶åããŠããã®ã§ã¯ïŒããšçããŸããã
çµè«: å€ãã
Citusã®ã·ã£ãŒãã¯åWorkeräžã§ã¯æ®éã®PostgreSQLããŒãã«ã§ãéåžžã®DMLïŒINSERT/UPDATE/DELETEïŒã§ã¯ RowExclusiveLock ããåããŸãããã·ã£ãŒãå
šäœãæä»ããã¯ããã®ã¯DDLã»VACUUM FULLã»ãªãã©ã³ã¹ãªã©ç¹æ®ãªæäœã ãã§ãã
â 仮説2: FKå¶çŽã«ãã芪ããŒãã«ãžã®ããã¯ã§ã¯ïŒ
çµè«: å€ãã
ããããFKå¶çŽã¯åŒµã£ãŠããŸããã§ããã
â 仮説3: åç §ããŒãã«ïŒreference tableïŒãžã®æžã蟌ã¿ã§ã¯ïŒ
åç §ããŒãã«ãžã®æžã蟌ã¿ã¯å šWorkerã«2PCã§äŒæããã®ã§ãDistributed Deadlock ã®å žåçãªçºçæºã§ãã
çµè«: å€ãã
ãã¹ã¿ããŒã¿ã¯å¥ã®ä»çµã¿ïŒmetadata管çïŒã§æã£ãŠããŠãåç
§ããŒãã«ã¯äœ¿ã£ãŠããŸããã§ããã
â 仮説4: INSERT ... ON CONFLICT DO UPDATE ã«ããç«¶åã§ã¯ïŒ
UPSERTã¯ãŠããŒã¯ã€ã³ããã¯ã¹ã®ããŒãžã¬ãã«ããã¯ãšè¡ããã¯ã®ååŸã¿ã€ãã³ã°ãå ¥ãçµãã§ããŠããããããã¯ãçã¿ããããã¿ãŒã³ã§ãã
çµè«: å€ãã
INSERT㯠ON CONFLICT DO NOTHING ã䜿ã£ãŠãããè¡ããã¯ã®ä¿ææéãé·ããªã DO UPDATE ã¯äœ¿ã£ãŠããŸããã§ããã
â 仮説5: ãããã INSERT ãããªã㊠DELETE ã ã£ã
å®éã®ãããããã¯ãã°ãå床ããèŠçŽãããšããããšã©ãŒãåºãŠããã®ã¯ INSERTã§ã¯ãªãDELETEæ ã§ããã
DELETE FROM profile_relations
WHERE tenant_id = $1 AND user_id <> $2 AND session_id = $3
RETURNING profile_id
ãã»ãã·ã§ã³IDã«çŽã¥ããŠãããªã¬ãŒã·ã§ã³ã®ãã¡ãä»åã®ãŠãŒã¶ãŒãããªããã®ãæ¶ãããšããã»ãã·ã§ã³ä¹ãæãçšã®åŠçã§ãã
ãããã話ã倧ããåããŸããã
ð åå ã®ç¹å®ïŒEXPLAIN ãæ±ºå®æ
ãã®DELETEã EXPLAIN ããŠã¿ããšãããæ±ºå®çãªæ
å ±ãåºãŠããŸããã
Custom Scan (Citus Adaptive)
Task Count: 32
Tasks Shown: One of 32
-> Task
Node: host=citus-worker1 ...
-> Delete on profile_relations_<shard_id>
-> Seq Scan on profile_relations_<shard_id>_<partition>
Filter: ((user_id <> ...) AND (tenant_id = ...) AND (session_id = ...))
Task Count: 32 ã§ãã32ã·ã£ãŒãå
šãŠã«å¯ŸããŠäžŠåã«DELETEãçºè¡ãããŠãã ããšãåãããŸããã
ð ïž ãªããããªãã®ã
ãã®ããŒãã«ã®åæ£ããŒã¯ profile_id ã§ãããšãããDELETEã®WHEREå¥ãèŠããšïŒ
WHERE tenant_id = $1 AND user_id <> $2 AND session_id = $3
忣ã㌠profile_id ã®æ¡ä»¶ãå«ãŸããŠããŸãããCitusã¯ã©ã®ã·ã£ãŒãã«å¯Ÿè±¡ããŒã¿ããããå€å®ã§ããªããããå
šã·ã£ãŒãã«ã¯ãšãªãæãã«ãããããªãããšããããã§ãã
ã€ãŸãã1ããã = 1ããã³ãããšãããã©ã³ã¶ã¯ã·ã§ã³èšèšã§ãããã®ãã©ã³ã¶ã¯ã·ã§ã³èªäœãå šWorkerã«è·šã æ§é ã«ãªã£ãŠããŸããã
ããããã€ã³ã
Citusã®åæ£ããŒãã«ã§ã¯ã忣ããŒãWHEREå¥ã«å«ãŸããŠããªãã¯ãšãªãããã³åæ£ããŒã«å¯Ÿã㊠<>ã>ã<ãLIKEãNOT IN ãªã©ã® éç䟡æŒç®å ã䜿ãã¯ãšãªã¯ãã·ã£ãŒããã«ãŒãã³ã°ãå¹ããŸãããä»åã¯åè
ã®ãã¿ãŒã³ïŒåæ£ããŒèªäœãWHEREã«ç¡ãïŒã§ããã
â Distributed Deadlock ãæç«ããã¡ã«ããºã
ãããåããã°ããããããã¯ã®çºçã¡ã«ããºã ã¯åçŽã§ãã
- ããã³ãAã®ãããã32ã·ã£ãŒãå šãŠã«æžã蟌ã¿ããã¯ãåãã«è¡ã
- ããã³ãBã®ãããã32ã·ã£ãŒãå šãŠã«æžã蟌ã¿ããã¯ãåãã«è¡ã
- åWorkeräžã§ãAãšBãããã¯ååŸé ãé転ããããšåŸ ã¡åããçºçãã
- åŸ ã¡åããWorkerè·šãã§åŸªç°ãããšãPostgreSQLæ¬äœã®æ€åºåšïŒåäžã€ã³ã¹ã¿ã³ã¹å ããèŠãªãïŒã§ã¯æ€åºã§ããªã
- Citusã®CoordinatoråŽã§åããŠãã Distributed Deadlock Detector ã埪ç°ãçºèŠãããã©ã³ã¶ã¯ã·ã§ã³ããã£ã³ã»ã«
ãããã³ããéãã®ã«ãããããã¯ãã®æ£äœã¯ããããã³ãããšã®ãã©ã³ã¶ã¯ã·ã§ã³ãå šWorkerã«è·šã£ãŠããŠãããã䞊åã«èµ°ããšã¶ã€ããããšããæ§é ã§ããã
ð èšèšèæ¯ïŒãªã profile_id 忣ãªã®ã
ããããããã³ãID忣ã«ããã°ïŒããšããæ¡ãåœç¶æµ®ãã³ãŸãããããã¡ãã¯éå»ã«æ€èšæžã¿ã§ãåŽäžããçµç·¯ããããŸãã
- ããŒã¿ã¹ãã¥ãŒãã²ã©ã: å€§èŠæš¡ããã³ãã®ããŒã¿ãç¹å®ã·ã£ãŒãã«éäžããŠãããã¹ãããå
- SELECTæ§èœãåºãªã: ããã³ãåäœã®ã¯ãšãªãåäžWorkerã«éäžããŠæããªã
profile_id 忣ã¯ã
- ããŒã¿ãã»ãŒåçã«åæ£ããã
- ãããã¡ã€ã«åäœã®ã¯ãšãªïŒæé »ã¢ã¯ã»ã¹ïŒãé«é
ãšããã¡ãªãããããåé¢ããã»ãã·ã§ã³IDèµ·ç¹ã®æ€çŽ¢ãªã©ã忣ããŒãæå®ã§ããªãã¯ãšãªããã«ãã·ã£ãŒãåããããšãããã¡ãªãããæ±ããŠããŸããä»åã®DELETEã¯ãŸãã«ãã®ãã¡ãªãããé¡åšåããã±ãŒã¹ã§ããã
ã€ãŸãã忣ããŒèšèšãã®ãã®ãå€ããã®ã§ã¯ãªããåé¡ã®ã¯ãšãªã工倫ããŠãã®ãã¡ãªããã軜æžããããšããæ¹åã§å¯ŸçãèããŸãã
â 察çïŒäºæ®µéåŠçãžã®æžãæã
æ¡çšããã®ã¯ãå
ã«SELECTã§å¯Ÿè±¡ profile_id ãååŸãããããæç€ºããŠDELETEããããšããäºæ®µéåŠçã§ãã
BeforeïŒãã«ãã·ã£ãŒãDELETEïŒ
DELETE FROM profile_relations
WHERE tenant_id = $1 AND user_id <> $2 AND session_id = $3
RETURNING profile_id
AfterïŒäºæ®µéåŠçïŒ
-- ã¹ããã1: 察象 profile_id ãååŸïŒèªã¿åãå°çšãªã®ã§é·æããã¯ãåããªãïŒ
SELECT profile_id FROM profile_relations
WHERE tenant_id = $1 AND user_id <> $2 AND session_id = $3;
-- ã¹ããã2: ååŸãã profile_id ã忣ããŒã«å«ããŠDELETE
DELETE FROM profile_relations
WHERE profile_id = ANY($1::bigint[])
AND tenant_id = $2
AND user_id <> $3
AND session_id = $4
RETURNING profile_id;
ãã€ã³ãã¯ä»¥äžã®2ç¹ã§ãã
-
profile_id = ANY(...)㯠ã·ã£ãŒããã«ãŒãã³ã°å¯èœ ãªåœ¢ãªã®ã§ã察象ã·ã£ãŒãã ãã«DELETEãçºè¡ããã - ã¹ããã1ã¯SELECTãªã®ã§ãé·æã®æžã蟌ã¿ããã¯ãåããããããããã¯ã®èµ·ç¹ã«ãªãã«ãã
ð SELECT ... FOR UPDATE ãä»ããªãããšã«ã€ããŠ
ãã¹ããã1ãšã¹ããã2ã®éã«ä»ã®ãã©ã³ã¶ã¯ã·ã§ã³ãå²ã蟌ãã ãã©ãããïŒããšããçåããããããããŸãããããæ¬ã±ãŒã¹ã§ã¯ FOR UPDATE 㯠ä»ããŠããŸãããçç±ã¯ä»¥äžã§ãã
-
Citusã®
SELECT ... FOR UPDATEã«ã¯å¶çŽããããã·ã³ã°ã«ã·ã£ãŒãã¯ãšãªã§ã¯åããŸããããã«ãã·ã£ãŒãã¯ãšãªã§ã¯å¶éããããæ¬ã±ãŒã¹ïŒãã«ãã·ã£ãŒãSELECTïŒã§ã¯äœ¿ããªããæåãèªã¿ã«ãã - ã¹ããã2ã®DELETEã§WHEREæ¡ä»¶ãåè©äŸ¡ãããã®ã§ãSELECTãDELETEéã«ç¶æ ãå€ãã£ãŠããDELETEã¯çŸåšã®ç¶æ ã«åºã¥ããŠæ£ããåã
- ãŠãŒã¹ã±ãŒã¹çã«äºéå®è¡ã埮åŠãªç«¶åã®å®³ãå°ããïŒã»ãã·ã§ã³ä¹ãæãåŠçã¯åªççã«æ¯ãèããïŒ
æä»å¶åŸ¡ãã©ãããŠãå
¥ãããå Žåã¯ãCitusã®å¶çŽãšç¡é¢ä¿ã«åã pg_advisory_xact_lock ã䜿ãã®ãçŸå®çããªãšæããŸãã
ð å¹æã®æ€èšŒ
å®éã«æžãæã㊠EXPLAIN ãåãçŽãããšããã
Task Countã SELECT ã§ååŸããã¬ã³ãŒãæ°ãšäžèŽãã
ããã«ãªããŸãããæ¬ã±ãŒã¹ã§ã¯éåžž1ã5ä»¶ãã察象ããªãã®ã§ãTask Count ã1ã5ã§ãã
| ææš | Before | After |
|---|---|---|
| Task Count | 32ïŒåºå®ïŒ | 1ã5ïŒå¯Ÿè±¡ä»¶æ°ã«æ¯äŸïŒ |
| ããã¯ååŸã·ã£ãŒãæ° | 32 | 1ã5 |
| 䞊åãããéã®ç«¶å確ç | ã»ãŒ100% | (5/32)² â 2.4% çšåºŠ |
| ã¹ãã£ã³ç¯å² | å šã·ã£ãŒã Ã å šããŒãã£ã·ã§ã³ | 該åœã·ã£ãŒãã®è©²åœããŒãã£ã·ã§ã³ã®ã¿ |
䞊åããããåãã·ã£ãŒããåãåã確çãåçã«äžããã®ã§ãDistributed Deadlock ã¯å®çšäžã»ãŒèµ·ããªããªãèŠèŸŒã¿ã§ããã¬ã€ãã³ã·ãæ¹åãæåŸ ã§ããŸãã
ð¡ïž æ¢åã®ç·©åçãšã®äœçœ®ã¥ã
æžãæãåãããéçšäžã®å¿æ¥åŠçœ®ãšããŠä»¥äžãå ¥ããŠããŸããã
- ãããã®äžŠåå®è¡æ°ãèµ·åæã«å¶åŸ¡ïŒäžŠå1ã2ã«æå¶ïŒ
- ãããããã¯çºçæã®èªåãªãã©ã€
ãããã¯æžãæãåŸã å®å šããããšããŠæ®ã ã®ãããããã§ããæžãæãã§æ ¹æ¬å¯Ÿå¿ã¯ã§ããŸãããæ³å®å€ã®ã±ãŒã¹ã§äœããèµ·ãããšãã®ä¿éºãšããŠæ©èœããŸããæžãæãåŸã¯ãªãã©ã€ãã»ãŒçºåããªããªãã¯ããªã®ã§ãã¡ããªã¯ã¹ãšããŠèŠ³æž¬ããŠãããš æ¹å广ã®å¯èŠå ã«ã䜿ããŸãã
ð ãŸãšã
ä»åã®èª¿æ»ãéããŠã®åŠã³ã§ãã
- ãšã©ãŒã¡ãã»ãŒãžã§
distributed deadlockãšæèšãããŠããå Žåãå¿ ãè€æ°Workerã«è·šãæäœãå«ãŸããŠãããã1ããã³ã1ãã©ã³ã¶ã¯ã·ã§ã³ã ããåäžWorkerã®ã¯ãããšæã蟌ãŸãã«ãã¯ãšãªã忣ããŒã§çµããŠããããçããŸãããã - 忣ããŒãWHEREå¥ã«ç¡ãã¯ãšãªã»éç䟡æŒç®åã䜿ãã¯ãšãªã¯èŠèŠæããããã¯ããã·ã¥åæ£ããŒãã«ã§ã¯ã·ã£ãŒããã«ãŒãã³ã°äžèœã§ãã
-
ã·ã£ãŒããã«ãŒãã³ã°ã®å¹ãã¯
EXPLAINã®Task Countã§å¿ ã確èªã§ããŸããTask Count: 1ãªãã·ã³ã°ã«ã·ã£ãŒãããã以å€ã¯ãã«ãã·ã£ãŒãã§ãã - 忣ããŒèšèšã¯ãã¬ãŒããªããã©ã®èšèšãéžãã§ã代åã¯ããã®ã§ã代åãé¡åšåããã¯ãšãªãã¢ããªã±ãŒã·ã§ã³ããžãã¯ã®å·¥å€«ã§è»œæžããããšããã¢ãããŒããçŸå®çã§ãã
Citusã¯åŒ·åã§ãããPostgreSQLåäœãšã¯éãèãæ¹ãå¿ èŠãªå Žé¢ã確ãã«ãããŸããåããããªäºè±¡ã«ééãããæ¹ã®å©ãã«ãªãã°å¬ããã§ãã
æåŸãŸã§èªãã§ããã ãããããšãããããŸãã ![]()
Cituséçšã§ã® Distributed Deadlock 察çã«ã€ããŠããã®èšäºãåèã«ãªãã°å¹žãã§ãã ![]()