0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

GitLab(v17.3.1)で2025-03-01に発生したバックグラウンドジョブ滞留問題

Posted at

この記事では2025/03/01に発生したGitLabのバックグラウンドジョブ滞留問題の障害対応内容を記載しています。

事象

  • Admin areaのBackground jobsのダッシュボード上利用者への影響がで始めて以降ジョブの完了件数が0のまま処理されていない。
    image.png

  • Admin areaのBackground jobsのキューに待機状態のジョブ件数が処理されずに増え続ける
    image.png

  • Admin areaのBackground jobsのCronタブにアクセスすると500エラーの画面が表示される
    image.png

利用者への影響

  • 以下のGitLab機能が動作しない
    • Merge Request作成、マージ
    • ProjectのFork
    • Pipeline実行

原因

  • Redisに2025-02-29という存在しない日時のデータが保存されており、参照しているSidekiqジョブで日付の処理エラーが発生している模様

対応

  • ログの調査
    • /var/log/gitlab/sidekiq/currentの内容を調査
    • 以下のログが続けて出力されている。
    • invalid dateが出力されている場合は本事象と判断して良い。
{"severity":"INFO","time":"2025-03-03T09:40:01.312Z","message":"Starting cluster with 1 processes"}
{"severity":"INFO","time":"2025-03-03T09:40:01.320Z","message":"Starting metrics server on port 8082"}
2025-03-03T09:40:17.307Z pid=343709 tid=80zt INFO: {:message=>"started", :memwd_handler_class=>"Gitlab::Memory::Watchdog::Handlers::SidekiqHandler", :memwd_sleep_time_s=>3, :pid=>343709, :worker_id=>"sidekiq_0", :memwd_rss_bytes=>430551040}
{"severity":"INFO","time":"2025-03-03T09:40:18.702Z","message":"Listening on queues [\"default,1\", \"mailers,1\"]"}
{"severity":"INFO","time":"2025-03-03T09:40:20.301Z","message":"GitLab reliable fetch activated!"}
{"severity":"INFO","time":"2025-03-03T09:40:20.301Z","message":"Sidekiq 7.1.6 connecting to Redis with options {:size=\u003e25, :pool_name=\u003e\"internal\", :path=\u003e\"/var/opt/gitlab/redis/redis.socket\", :custom=\u003e{:instrumentation_class=\u003e\"Queues\"}, :url=\u003enil}"}
invalid date
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:631:in `parse'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:631:in `rescue in parse_enqueue_time'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:628:in `parse_enqueue_time'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:290:in `initialize'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:228:in `new'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:228:in `block in all'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:226:in `collect'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:226:in `all'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:534:in `destroy_removed_jobs'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:177:in `load_from_hash!'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/sidekiq.rb:9:in `load_cron_jobs!'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/sidekiq.rb:120:in `block in <top (required)>'
/opt/gitlab/embedded/service/gitlab-rails/vendor/gems/sidekiq-7.1.6/lib/sidekiq.rb:98:in `configure_server'
/opt/gitlab/embedded/service/gitlab-rails/config/initializers/sidekiq.rb:46:in `<top (required)>'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/engine.rb:667:in `load'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/engine.rb:667:in `block in load_config_initializer'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/activesupport-7.0.8.4/lib/active_support/notifications.rb:208:in `instrument'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/engine.rb:666:in `load_config_initializer'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/engine.rb:620:in `block (2 levels) in <class:Engine>'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/engine.rb:619:in `each'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/engine.rb:619:in `block in <class:Engine>'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/initializable.rb:32:in `instance_exec'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/initializable.rb:32:in `run'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/initializable.rb:61:in `block in run_initializers'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:228:in `block in tsort_each'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:350:in `block (2 levels) in each_strongly_connected_component'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:422:in `block (2 levels) in each_strongly_connected_component_from'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:431:in `each_strongly_connected_component_from'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:421:in `block in each_strongly_connected_component_from'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/initializable.rb:50:in `each'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/initializable.rb:50:in `tsort_each_child'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:415:in `call'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:415:in `each_strongly_connected_component_from'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:349:in `block in each_strongly_connected_component'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:347:in `each'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:347:in `call'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:347:in `each_strongly_connected_component'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:226:in `tsort_each'
/opt/gitlab/embedded/lib/ruby/3.1.0/tsort.rb:205:in `tsort_each'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/initializable.rb:60:in `run_initializers'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/application.rb:372:in `initialize!'
/opt/gitlab/embedded/service/gitlab-rails/config/environment.rb:7:in `<top (required)>'
<internal:/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
<internal:/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/rubygems/core_ext/kernel_require.rb>:37:in `require'
/opt/gitlab/embedded/service/gitlab-rails/vendor/gems/sidekiq-7.1.6/lib/sidekiq/cli.rb:303:in `boot_application'
/opt/gitlab/embedded/service/gitlab-rails/vendor/gems/sidekiq-7.1.6/lib/sidekiq/cli.rb:42:in `run'
/opt/gitlab/embedded/service/gitlab-rails/vendor/gems/sidekiq-7.1.6/bin/sidekiq:31:in `<top (required)>'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/bin/sidekiq:25:in `load'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/bin/sidekiq:25:in `<top (required)>'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/cli/exec.rb:58:in `load'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/cli/exec.rb:58:in `kernel_load'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/cli/exec.rb:23:in `run'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/cli.rb:455:in `exec'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor/command.rb:28:in `run'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor/invocation.rb:127:in `invoke_command'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor.rb:527:in `dispatch'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/cli.rb:35:in `dispatch'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/vendor/thor/lib/thor/base.rb:584:in `start'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/cli.rb:29:in `start'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/bundler-2.5.11/exe/bundle:28:in `block in <top (required)>'
/opt/gitlab/embedded/lib/ruby/site_ruby/3.1.0/bundler/friendly_errors.rb:117:in `with_friendly_errors'
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/bundler-2.5.11/exe/bundle:20:in `<top (required)>'
/opt/gitlab/embedded/bin/bundle:25:in `load'
/opt/gitlab/embedded/bin/bundle:25:in `<main>'
{"severity":"INFO","time":"2025-03-03T09:40:20.330Z","message":"stopped","memwd_reason":"background task stopped","memwd_handler_class":"Gitlab::Memory::Watchdog::Handlers::SidekiqHandler","memwd_sleep_time_s":3,"pid":343709,"worker_id":"sidekiq_0","memwd_rss_bytes":488927232,"retry":0}
{"severity":"INFO","time":"2025-03-03T09:40:21.323Z","message":"A worker terminated, shutting down the cluster"}
  • rails consoleでSidekiq::Cron::Job.allを実行し、同じエラーが出ることを確認する。
[root@dcvmn01]:[/etc/gitlab]# gitlab-rails console
--------------------------------------------------------------------------------
 Ruby:         ruby 3.1.5p253 (2024-04-023 revision 1945f8dc0e) [x86_64-linux]
 GitLab:       17.3.1-ee (df01858216e) EE
 GitLab Shell: 14.38.0
 PostgreSQL:   14.11
------------------------------------------------------------[ booted in 42.06s ]
WARNING: Active Record does not support composite primary key.

security_findings has composite primary key. Composite primary key is ignored.
Loading production environment (Rails 7.0.8.4)
irb(main):001:0> Sidekiq::Cron::Job.all
/opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:631:in `parse': invalid date (Date::Error)
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:631:in `rescue in parse_enqueue_time'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:628:in `parse_enqueue_time'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:290:in `initialize'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:228:in `new'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:228:in `block in all'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:226:in `collect'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/sidekiq-cron-1.12.0/lib/sidekiq/cron/job.rb:226:in `all'
	from (irb):1:in `<main>'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/commands/console/console_command.rb:74:in `start'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/commands/console/console_command.rb:19:in `start'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/commands/console/console_command.rb:106:in `perform'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/thor-1.3.1/lib/thor/command.rb:28:in `run'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/thor-1.3.1/lib/thor/invocation.rb:127:in `invoke_command'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/thor-1.3.1/lib/thor.rb:527:in `dispatch'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/command/base.rb:87:in `perform'
	from /opt/gitlab/embedded/lib/ruby/gems/3.1.0/gems/railties-7.0.8.4/lib/rails/command.rb:48:in `invoke'
	... 4 levels...
  • ソースコードの内容から参照しているRedisを特定しRedis Cliを使用してジョブ一覧を取得
[root@dcvmn01]:[/root]# /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket 
redis /var/opt/gitlab/redis/redis.socket> SMEMBERS cron_jobs
  1) "cron_job:ensure_merge_requests_prepared_worker"
 省略
151) "cron_job:loose_foreign_keys_cleanup_worker"
  • 上記の151個のジョブのlast_enqueue_timeを取得(以下cron_job:analytics_cycle_analytics_consistency_workerのlast_enqueue_timeが対象)
redis /var/opt/gitlab/redis/redis.socket> 
[root@dcvmn01]:[/etc/gitlab]# sudo /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket
redis /var/opt/gitlab/redis/redis.socket> HGET cron_job:analytics_cycle_analytics_consistency_worker last_enqueue_time
"2025-02-28 15:00:08 +0000"
redis /var/opt/gitlab/redis/redis.socket>
  • ExcelでSMEMBERS cron_jobsの結果を元にジョブ名からHGETコマンドとHSETコマンドを生成する
  • ↑の結果が2025-02-29になっているものを確認
  • 確認した結果NGだったものに対して“2025-02-28 15:00:08 +0000"でフィールドを更新するためのHSETを実行
[root@dcvmn01]:[/etc/gitlab]# sudo /opt/gitlab/embedded/bin/redis-cli -s /var/opt/gitlab/redis/redis.socket
redis /var/opt/gitlab/redis/redis.socket> HSET cron_job:search_elastic_metrics_update_cron_worker  last_enqueue_time "2025-02-28 15:00:08 +0000"
(integer) 0
redis /var/opt/gitlab/redis/redis.socket> HGET cron_job:search_elastic_metrics_update_cron_worker last_enqueue_time
"2025-02-28 15:00:08 +0000"
redis /var/opt/gitlab/redis/redis.socket> HSET cron_job:search_elastic_metrics_update_cron_worker  last_enqueue_time "2025-02-28 15:00:08 +0000"
(integer) 0
redis /var/opt/gitlab/redis/redis.socket> HGET cron_job:stuck_ci_jobs_worker last_enqueue_time
"2025-02-29 15:00:08 +0000"
<以下省略>
  • 稼働確認としてrails_consoleからSidekiq::Cron::Job.allを再実行しinvalid dateのエラーが表示されなければOK
irb(main):003:0> Sidekiq::Cron::Job.all
=> 
[#<Sidekiq::Cron::Job:0x00007ff17601f440
  @active_job=false,
  @active_job_queue_name_delimiter="",
  @active_job_queue_name_prefix="",
  @args=[],
  @cron="0 */4 * * *",
  @date_as_argument=false,
  @description="",
  @fetch_missing_args=false,
  @klass="BulkImports::StaleImportWorker",
  @last_enqueue_time=2025-02-28 12:00:06 UTC,
  @message="{\"retry\":false,\"queue\":\"default\",\"version\":0,\"store\":null,\"queue_namespace\":\"cronjob\",\"class\":\"BulkImports::StaleImportWorker\",\"args\":[]}",
  @name="bulk_imports_stale_import_worker",
  @queue="default",
  @queue_name_with_prefix="default",
  @source="dynamic",
  @status="enabled",
  @symbolize_args=false>,
<以下省略>
  • ジョブの再生成は定期的に実行されるため、しばらく待機してから前述の事象が解決していることを確認する。待機状態のジョブの件数が減り始めればOK。

  • background_jobsからCronタブを確認
    image.png

  • ジョブの滞留も解消したことを確認
    image.png

  • パイプライン実行画面のVariablesが表示されることを確認
    image.png

  • MRのマージが完了していることを確認
    image.png

  • 2025/3/1の発生時はGitLab再起動も実施していたので上記で解決しない場合は再起動を実施

GitLab Supportへの問い合わせ

サポートに問い合わせた結果Sidekiqの障害に起因しているとのこと。
この障害が解消されない限り毎年同様の問題が発生すると思われる。
https://gitlab.com/gitlab-org/gitlab/-/issues/523081#top

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?