LoginSignup
18
13

More than 5 years have passed since last update.

google-api-ruby-client で BigQuery にクエリを投げるとメモリを大量に喰って憤死する

Last updated at Posted at 2016-09-29

問題

google-api-ruby-client を使って BigQuery にクエリを投げると、内部で representable gem というものを使っているせいで、レスポンスの結果を全てモデルオブジェクト化しようとしてCPUとメモリを大量に消費する。

例えば、手元の例では2.4MBの結果を取得するのに、Rubyプロセスが500MBに肥大化した。

対策

EDIT: 0.11.0 で skip_deserialization option を指定すると、オブジェクト化をスキップできるようになりました。See also issues/475

無駄にオブジェクト化しないでくれ、ということで monkey patch してしまった。

# monkey patch not to create representable objects which consumes lots of memory
module Google
  module Apis
    module BigqueryV2
      class BigqueryService < Google::Apis::Core::BaseService
        def query_job(project_id, query_request_object = nil, fields: nil, quota_user: nil, user_ip: nil, options: nil, &block)
          command =  make_simple_command(:post, 'projects/{projectId}/queries', options)
          command.request_representation = Google::Apis::BigqueryV2::QueryRequest::Representation
          command.request_object = query_request_object
          # command.response_representation = Google::Apis::BigqueryV2::QueryResponse::Representation # monkey patch
          command.response_class = Google::Apis::BigqueryV2::QueryResponse
          command.params['projectId'] = project_id unless project_id.nil?
          command.query['fields'] = fields unless fields.nil?
          command.query['quotaUser'] = quota_user unless quota_user.nil?
          command.query['userIp'] = user_ip unless user_ip.nil?
          execute_or_queue_command(command, &block)
        end

        def get_job_query_results(project_id, job_id, max_results: nil, page_token: nil, start_index: nil, timeout_ms: nil, fields: nil, quota_user: nil, user_ip: nil, options: nil, &block)
          command =  make_simple_command(:get, 'projects/{projectId}/queries/{jobId}', options)
          # command.response_representation = Google::Apis::BigqueryV2::GetQueryResultsResponse::Representation # monkey patch
          command.response_class = Google::Apis::BigqueryV2::GetQueryResultsResponse
          command.params['projectId'] = project_id unless project_id.nil?
          command.params['jobId'] = job_id unless job_id.nil?
          command.query['maxResults'] = max_results unless max_results.nil?
          command.query['pageToken'] = page_token unless page_token.nil?
          command.query['startIndex'] = start_index unless start_index.nil?
          command.query['timeoutMs'] = timeout_ms unless timeout_ms.nil?
          command.query['fields'] = fields unless fields.nil?
          command.query['quotaUser'] = quota_user unless quota_user.nil?
          command.query['userIp'] = user_ip unless user_ip.nil?
          execute_or_queue_command(command, &block)
        end
      end
    end
  end
end

command.response_representation を nil にしておくと、representable をかまさずに生 json text を返してくれるようになっていたのでそれを利用した。 See lib/google/apis/core/api_command.rb#L69

おわりに

うーん、Google さん...

18
13
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
18
13