Rails
Linux
Gem

処理中のDelayedJobは止めずに再起動させる

はじめに

デプロイ時などにdelayedjobを再起動する際、処理中のプロセスも容赦なく強制終了してしまう問題があるので、それを回避しつつ再起動させる方法をメモ代わりに残します。

現象確認

  • workerを3つ起動させる
$ RAILS_ENV=production bundle exec bin/delayed_job -n3 start
delayed_job.0: process with pid 5829 started.
delayed_job.1: process with pid 5836 started.
delayed_job.2: process with pid 5843 started.
  • delayed_job.2のプロセスが動いている状態でrestartもしくはstopさせる
$ RAILS_ENV=production bundle exec bin/delayed_job -n3 restart
delayed_job.0: trying to stop process with pid 5829...
delayed_job.0: process with pid 5829 successfully stopped.
delayed_job.0: process with pid 16768 started.
delayed_job.1: trying to stop process with pid 5836...
delayed_job.1: process with pid 5836 successfully stopped.
delayed_job.1: process with pid 16780 started.
delayed_job.2: process with pid 5843 wont stop, we forcefully kill it...
delayed_job.2: process with pid 5843 successfully stopped.
delayed_job.2: process with pid 16792 started.

$ RAILS_ENV=production bundle exec bin/delayed_job -n3 stop
delayed_job.0: trying to stop process with pid 5829…
delayed_job.0: process with pid 5829 successfully stopped.
delayed_job.1: trying to stop process with pid 5836…
delayed_job.1: process with pid 5836 successfully stopped.
delayed_job.2: process with pid 5843 wont stop, we forcefully kill it...
delayed_job.2: process with pid 5843 successfully stopped.

プロセスが動いていないdelayed_job.0delayed_job.1は何事もなく終了するが、
プロセスが動いているdelayed_job.2won't stop, we forcefully kill it…
と言われ停止させようとしても止まらないから強制終了されています。

irb(main):010:0> Delayed::Job.first
=> #<Delayed::Backend::Activetyperd::Job id: 212, priority: 0, attempts: 0, handler: "--- !ruby/object:Delayed::PerformableMethod\nobject...", last_error: nil, run_at: "2017-11-10 05:11:08", locked_at: "2017-11-10 05:11:08", failed_at: nil, locked_by: "delayed_job.2 host:type1 pid:5843", queue: nil, created_at: "2017-11-10 05:11:08", updated_at: "2017-11-10 05:11:08", tenant: "test1">

強制終了のため、対象のレコードも失敗状態にならない。
(last_errorとfailed_atがnil)

回避策

"restartで強制終了されたらどうしようもないし、プロセスが全て止まっているタイミングを見計らって再起動。"みたいなことはできないので以下のような感じで回避してみました。

  1. kill -SIGTERM(処理が終わったら勝手に止まる)を使う
  2. DelayedJob起動時に--identifier(delayed_job.{指定した文字}になる)を使う
  3. バッチで起動チェックする
$ RAILS_ENV=production bundle exec bin/delayed_job --identifier=1  start
delayed_job.1: process with pid 29186 started.
$ RAILS_ENV=production bundle exec bin/delayed_job --identifier=2  start
delayed_job.2: process with pid 29225 started.
$ RAILS_ENV=production bundle exec bin/delayed_job --identifier=3  start
delayed_job.3: process with pid 29265 started.
$ kill -SIGTERM 29225
$ kill -SIGTERM 29186
$ kill -SIGTERM 29265
$ RAILS_ENV=production bundle exec bin/delayed_job --identifier=1  start
delayed_job.1: process with pid 6144 started.
$ RAILS_ENV=production bundle exec bin/delayed_job --identifier=2  start
delayed_job.2: process with pid 6188 started.
$ RAILS_ENV=production bundle exec bin/delayed_job --identifier=3  start
ERROR: there is already one or more instance(s) of the program running # 処理中のため。終了するとプロセスが止まる
$ RAILS_ENV=production bundle exec bin/delayed_job --identifier=3  start
delayed_job.3: process with pid 7523 started.  

バッチ処理

class Tasks::DelayedJob::BootCheck
  DELAYED_JOB_WORKER_NUM = 3
  class << self
    def execute
      DELAYED_JOB_WORKER_NUM.times do |num|
        next if `RAILS_ENV=#{Rails.env} bin/delayed_job --identifier=#{num} status`.include?('running')
        `RAILS_ENV=#{Rails.env} bin/delayed_job --identifier=#{num} start`
      end
    end
  end
end

cronで毎分チェックするとバッチ処理が終わっていない状態で次のバッチが走ってしまうため、プロセスがどんどんたまっていく問題が...。3分間隔ぐらいだとちょうどいいかな。
hogehoge.png
"3分間隔とかじゃなくて毎分やらないといけない”っていうなら、排他制御で処理が重複しないようにしましょう。

排他制御

  class << self
    def execute
      exclude_proces do
        DELAYED_JOB_WORKER_NUM.times do |num|
          next if `RAILS_ENV=#{Rails.env} bin/delayed_job --identifier=#{num} status`.include?('running')
          `RAILS_ENV=#{Rails.env} bin/delayed_job --identifier=#{num} start`
        end
      end
    rescue StandardError
      Rails.logger.info I18n.t('logging.excluding')
    end

    def exclude_proces
      File.open(lock_file_path, 'w') do |lock_file|
        if lock_file.flock(File::LOCK_EX|File::LOCK_NB)
          yield
        else
          raise StandardError
        end
      end
    end

    def lock_file_path
      File.join Rails.root, :tmp.to_s, :delayed_job.to_s
    end
  end
end