More than 5 years have passed since last update.

アクティブなプロセス数が多ければリクエストは簡単に詰まる

負荷対策

Last updated at 2020-02-08Posted at 2020-02-08

前置き

httpのリクエストを受けたときアクセスが詰まることがあります。

アクセスが詰まるというのは普段は速いはずのリクエストが、急に遅くなるということを指しています。
普段10msでレスポンスを返すリクエストがたまに100msで返すようなことがある、といった性能劣化のことです。

で、長いこと、webの運用してましたが、なんでこれが発生するのかあんま理解してませんでした。
恥ずかしながら、考えるな、感じろ、で運用していたというわけです。
このアクセス詰まりのほとんどが短期的なネットワーク障害や、データ量が大きくなった場合の性能劣化や、キャッシュが期限切れになったりした場合に発生すると思っていたためです。

しかし最近障害などなくても、リクエストを受けすぎるとアクティブなプロセス(CPUを利用している状態のプロセス)が増えるせいで、1リクエストあたりに割り当てられるCPU時間が減ることに気づきました。
障害によるレスポンス遅延でも同様の事象は発生するので、障害が原因としても当てはまりはするのですが、より一般的にはこちらが正しいのかなと思います。
かなり初歩的な話ですが、なんというか、並列度上げすぎるとCPU食ってまずいくらいにしか意識してませんでした。

前置きが長くなりましたが、普段速いリクエストが、並列度を上げると簡単に遅くなる実験をしようと思います。

構成

nginx - php-fpm
設定を一応載せますが、ここは読み飛ばしていいところです
php-fpmは50プロセスくらい受けられるようにしておきます

nginx.conf

user  nginx;
worker_processes  2;

error_log  /var/log/nginx/error.log warn;
pid        /var/run/nginx.pid;


events {
  worker_connections  2048;
}


http {
  include       /etc/nginx/mime.types;
  default_type  application/octet-stream;

  log_format  main  '$remote_addr - $remote_user [$time_local] "$request" '
    '$status $body_bytes_sent "$http_referer" '
    '"$http_user_agent" "$http_x_forwarded_for" "$request_time"';

  access_log  /var/log/nginx/access.log  main;

  sendfile        on;

  keepalive_timeout  65;

  include /etc/nginx/conf.d/*.conf;
}

conf.d/default.conf

server {
    index index.php index.html;
    server_name localhost;
    error_log  /var/log/nginx/error.log;
    access_log /var/log/nginx/access.log main;
    root /var/www/html;

    location / {
        try_files $uri $uri/ /index.php$is_args$args;
    }

    location ~ \.php$ {
        fastcgi_pass unix:/tmp/load-of-web.sock;
        fastcgi_index index.php;
        fastcgi_param SCRIPT_FILENAME $document_root/$fastcgi_script_name;
        include fastcgi_params;
    }
}

php-fpm.conf

[global]
daemonize = no
rlimit_files = 4096
include=etc/php-fpm.d/*.conf

php-fpm.d/docker.conf

[www]
user = www-data
listen = /tmp/load-of-web.sock
listen.mode = 0666

pm = dynamic
pm.max_children = 80
pm.start_servers = 50
pm.min_spare_servers = 30
pm.max_spare_servers = 70

詳しくはwebで

100msかかるレスポンス

通常は100msかかるレスポンスを用意しておきます。

ひとつはsleepでcpuを休ませながらリクエストを受けるもの。
もう一方はループ + 計算でCPUをガンガン使うもの。

として、ふたつ用意します。

calc-100ms.phpのほうは手元の環境で100msになるようにループ回数を調整しています。

calc-100ms.php

<?php

$j = 0;
$num_of_loop = 3300000;
for($i = 0; $i < $num_of_loop; $i++){
    $j++;
}

sleep-100ms.php

<?php

usleep(100 * 1000);

これらを置いて、試しにcurlしてみて

$ curl -s -v http://localhost/calc-100ms.php
$ curl -s -v http://localhost/sleep-100ms.php

アクセスログを見てみると、だいたい100msになっています。

"GET /sleep-100ms.php HTTP/1.1" 200 5 "-" "curl/7.58.0" "-" "0.102"
"GET /calc-100ms.php HTTP/1.1" 200 5 "-" "curl/7.58.0" "-" "0.102"

並列度を上げてリクエストしてみる1

まずはsleepで100msかかる方です

$ ab -c 40 -n 100 http://localhost/sleep-100ms.php
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        nginx/1.17.8
Server Hostname:        localhost
Server Port:            80

Document Path:          /sleep-100ms.php
Document Length:        0 bytes

Concurrency Level:      40
Time taken for tests:   0.354 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      16300 bytes
HTML transferred:       0 bytes
Requests per second:    282.71 [#/sec] (mean)
Time per request:       141.488 [ms] (mean)
Time per request:       3.537 [ms] (mean, across all concurrent requests)
Transfer rate:          45.00 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   1.4      0       5
Processing:   101  118  16.4    111     149
Waiting:      101  118  16.4    111     148
Total:        101  120  17.1    111     150

Percentage of the requests served within a certain time (ms)
  50%    111
  66%    128
  75%    140
  80%    143
  90%    146
  95%    148
  98%    150
  99%    150
 100%    150 (longest request)

110～120msでだいたい返っているので、まあ性能劣化はあまりありません。
ログを見てみるとほとんどが0.101-0.105なので10-20msはクライアントまでのオーバーヘッドといったところでしょうか。

並列度を上げてリクエストしてみる2

はい、本題のループしてインクリメントする処理を行う方ですが以下の結果になりました。

$ ab -c 40 -n 100 http://localhost/calc-100ms.php
This is ApacheBench, Version 2.3 <$Revision: 1807734 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking localhost (be patient).....done


Server Software:        nginx/1.17.8
Server Hostname:        localhost
Server Port:            80

Document Path:          /calc-100ms.php
Document Length:        0 bytes

Concurrency Level:      40
Time taken for tests:   3.607 seconds
Complete requests:      100
Failed requests:        0
Total transferred:      16300 bytes
HTML transferred:       0 bytes
Requests per second:    27.72 [#/sec] (mean)
Time per request:       1442.963 [ms] (mean)
Time per request:       36.074 [ms] (mean, across all concurrent requests)
Transfer rate:          4.41 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    1   1.8      0       5
Processing:   596 1319 321.9   1413    1774
Waiting:      596 1319 321.9   1412    1774
Total:        596 1320 322.6   1415    1778

Percentage of the requests served within a certain time (ms)
  50%   1415
  66%   1496
  75%   1537
  80%   1543
  90%   1666
  95%   1701
  98%   1755
  99%   1778
 100%   1778 (longest request)

1リクエストあたりの平均が1300-1400msになっています。

"GET /calc-100ms.php HTTP/1.0" 200 0 "-" "ApacheBench/2.3" "-" "1.420"
"GET /calc-100ms.php HTTP/1.0" 200 0 "-" "ApacheBench/2.3" "-" "1.375"
"GET /calc-100ms.php HTTP/1.0" 200 0 "-" "ApacheBench/2.3" "-" "1.355"

アクセスログをいくつか抜粋してもレスポンスタイムが1.4秒とかかかっていることが分かります。

何で遅延が起きるのか？

sleepしてる方はほぼ劣化が起きてないので、CPUのコアが足りてないからということが分かりますね。
実験環境でのCPUコア数に関して書いてませんでしたが、論理コア数は4です。

$ grep processor /proc/cpuinfo| wc -l
4

40並列でリクエストしているので、php-fpm側のアクティブなプロセス数も40です。
そこでCPUを休ませずプロセスを動かすと、ほとんど均等に40分割でプロセスを動かすことになるはずです。
※CPUとプロセスの割り当てのロジックによる誤差はたぶんあるはず

イメージとしては

一つのプロセスを10ms進めたら別のプロセスにCPUを明け渡す
これを40プロセス分やらないと10ms進めたプロセスにはCPUがもう一度割り当てられない
40プロセス分やるのに4コアだから100msかかる
上記を10回やれば1プロセスの処理が終わる
なので1秒以上かかる

といったところでしょうか。

終わりに

並列度があまりにも高すぎるとCPUのコア数が足りずに頭打ちになります。
そしてこの詰まりは瞬間的に発生すると通常のグラフ上には表れにくい問題です。(平均されたりサンプリングが1分間だったりするから)
90%タイルのメトリクスで負荷を監視するとある程度そういう事象は確認できると思います。

CPUのコア数の問題としましたが、厳密に根拠を示せてるわけじゃないため証明できているわけではありません。
誤りがある場合は教えていただければ幸いです。

ここまでで何が言いたかったかというと、
CPUをたくさん使う時間のかかるリクエストがあり、
たまたま同じサーバにそのリクエストが割り当てられると、
他のリクエストが受けきれずエラーが返ってしまう場合があるということです。
特にサーバ台数が少ないと重いリクエストが同じサーバに割り当てられる可能性が高くなります。
なので気を付けましょうということです。

今までテキトーに考えてたところをちゃんと考えられるようになれてすっきりしました。

You get articles that match your needs
You can efficiently read back useful information
You can use dark theme

What you can do with signing up