15
11

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

More than 5 years have passed since last update.

Prometheusのすすめ - プロセス監視 -

Posted at

プロセスの監視をしたい

今までの記事で、外形監視とリソース監視についてはアラートをあげる所まで確認できました。
あと欲しい監視項目としてはプロセス監視、ログ監視がありますが、色々と調べてみるとそれぞれのMW用のexporterを入れるのが基本みたいですが、
とりあえずプロセスだけを監視したいので探した所「process-exporter」というのがあったので試します。

process-exporter 導入(バイナリ)

Gitに上がっているREADME.mdを見ながらやっていきます。
GitHub - ncabatoff/process-exporter

Dockerでいけるみたいだけど、いつもの如くバイナリでいきます。
(本番という環境でDockerってあまり使いたくないんですよね)

最新のダウンロードページは以下
https://github.com/ncabatoff/process-exporter/releases

色々なサーバに入れるのでAnsibleで導入します。
(でもいつも最初は手動でダウンロードして、--helpみたりしながらやってます)

process-exporter.yml
---
- hosts: exsamplehost
  user: exsampleuser
  sudo: yes
  tasks:
  - name: check exist file
    stat:
      path: /usr/bin/process-exporter
    register: file
  - name: wget process-exporter
    get_url:
      url: https://github.com/ncabatoff/process-exporter/releases/download/v0.2.11/process-exporter_0.2.11_linux_amd64.tar.gz
      dest: /tmp/process-exporter_0.2.11_linux_amd64.tar.gz
  - name: unarchive  process-exporter
    unarchive:
      src: /tmp/process-exporter_0.2.11_linux_amd64.tar.gz
      remote_src: yes
      dest: /tmp/
  - name: copy process-exporter binary
    shell: cp /tmp/process-exporter /usr/bin/process-exporter
    when: not file.stat.exists
  - name: create process-exporter systemd
    blockinfile:
      path: /etc/systemd/system/process-exporter.service
      create: yes
      block: |
        [Unit]
        Description=process-exporter for Prometheus

        [Service]
        Restart=always
        ExecStart=/usr/bin/process-exporter \
        ExecReload=/bin/kill -HUP $MAINPID
        TimeoutStopSec=20s
        SendSIGKILL=no

        [Install]
        WantedBy=multi-user.target

configが難しい。。

コンフィグの設定例と、私はこう使ってるっていうのが記載されているんだけど、
commは何となく分かるけど、exeとかcmdlineとかnameとかが理解出来ない!

抜粋https://github.com/ncabatoff/process-exporter#config

process_names:
 # comm is the second field of /proc/<pid>/stat minus parens.
 # It is the base executable name, truncated at 15 chars.  
 # It cannot be modified by the program, unlike exe.
 - comm:
   - bash
   
 # exe is argv[0]. If no slashes, only basename of argv[0] need match.
 # If exe contains slashes, argv[0] must match exactly.
 - exe: 
   - postgres
   - /usr/local/bin/prometheus

 # cmdline is a list of regexps applied to argv.
 # Each must match, and any captures are added to the .Matches map.
 - name: "{{.ExeFull}}:{{.Matches.Cfgfile}}"
   exe: 
   - /usr/local/bin/process-exporter
   cmdline: 
   - -config.path\\s+(?P<Cfgfile>\\S+)
   

Here's the config I use on my home machine:


process_names:
 - comm: 
   - chromium-browse
   - bash
   - prometheus
   - gvim
 - exe: 
   - /sbin/upstart
   cmdline: 
   - --user
   name: upstart:-user

## 色々試してみる

色々と書いて、その場でcurlで確認していけば理解出来るかなー。淡い期待を抱いて。
curlは一杯行が出てきて分かり辛いのでプロセス数だけに絞ってみます。

curl http://localhost:9256/metrics | grep "namedprocess_namegroup_num_procs"

### commだけ
```yaml:config.yaml
process_names:
  - comm:
    - prometheus
    - php-fpm7.0
    - nginx

curl結果

# TYPE namedprocess_namegroup_num_procs gauge
namedprocess_namegroup_num_procs{groupname="nginx -g daemon on; master_process on"} 1
namedprocess_namegroup_num_procs{groupname="nginx: worker process"} 1
namedprocess_namegroup_num_procs{groupname="php-fpm.conf)"} 1
namedprocess_namegroup_num_procs{groupname="php-fpm: pool www"} 2
namedprocess_namegroup_num_procs{groupname="prometheus"} 1

exeだけ

config.yaml
process_names:
  - exe:
    - /usr/bin/prometheus
    - /usr/sbin/nginx
    - /usr/sbin/php-fpm7.0

curl結果

# TYPE namedprocess_namegroup_num_procs gauge
namedprocess_namegroup_num_procs{groupname="prometheus"} 1

書き方が違う?以下に変更しても結果は変わらず。

config.yaml
process_names:
  - exe:
    - /usr/bin/prometheus
  - exe:
    - /usr/sbin/nginx

cmdlineだけ(全部取得)

config.yaml
process_names:
  - cmdline:
    - .+

curl結果

# TYPE namedprocess_namegroup_num_procs gauge
namedprocess_namegroup_num_procs{groupname="(sd-pam)"} 1
namedprocess_namegroup_num_procs{groupname="-bash"} 3
namedprocess_namegroup_num_procs{groupname="-su"} 3
namedprocess_namegroup_num_procs{groupname="0"} 1
namedprocess_namegroup_num_procs{groupname="1"} 1
namedprocess_namegroup_num_procs{groupname="2"} 1
namedprocess_namegroup_num_procs{groupname="accounts-daemon"} 1
namedprocess_namegroup_num_procs{groupname="acpid"} 1
namedprocess_namegroup_num_procs{groupname="agetty"} 2
namedprocess_namegroup_num_procs{groupname="atd"} 1
namedprocess_namegroup_num_procs{groupname="blackbox_exporter"} 1
namedprocess_namegroup_num_procs{groupname="cron"} 1
namedprocess_namegroup_num_procs{groupname="curl"} 1
namedprocess_namegroup_num_procs{groupname="dbus-daemon"} 1
namedprocess_namegroup_num_procs{groupname="dhclient"} 1
namedprocess_namegroup_num_procs{groupname="grafana-server"} 1
namedprocess_namegroup_num_procs{groupname="grep"} 1
namedprocess_namegroup_num_procs{groupname="init"} 1
namedprocess_namegroup_num_procs{groupname="iscsid"} 2
namedprocess_namegroup_num_procs{groupname="lvmetad"} 1
namedprocess_namegroup_num_procs{groupname="lxcfs"} 1
namedprocess_namegroup_num_procs{groupname="master"} 1
namedprocess_namegroup_num_procs{groupname="mdadm"} 1
namedprocess_namegroup_num_procs{groupname="memcached"} 1
namedprocess_namegroup_num_procs{groupname="nginx -g daemon on; master_process on"} 1
namedprocess_namegroup_num_procs{groupname="nginx: worker process"} 1
namedprocess_namegroup_num_procs{groupname="node_exporter"} 1
namedprocess_namegroup_num_procs{groupname="php-fpm.conf)"} 1
namedprocess_namegroup_num_procs{groupname="php-fpm: pool www"} 2
namedprocess_namegroup_num_procs{groupname="pickup"} 1
namedprocess_namegroup_num_procs{groupname="polkitd"} 1
namedprocess_namegroup_num_procs{groupname="process-exporter"} 1
namedprocess_namegroup_num_procs{groupname="prometheus"} 1
namedprocess_namegroup_num_procs{groupname="qmgr"} 1
namedprocess_namegroup_num_procs{groupname="rsyslogd"} 1
namedprocess_namegroup_num_procs{groupname="snapd"} 1
namedprocess_namegroup_num_procs{groupname="sshd"} 1
namedprocess_namegroup_num_procs{groupname="sshd: da-dev [priv]"} 3
namedprocess_namegroup_num_procs{groupname="su"} 3
namedprocess_namegroup_num_procs{groupname="sudo"} 3
namedprocess_namegroup_num_procs{groupname="systemd"} 1
namedprocess_namegroup_num_procs{groupname="systemd-journald"} 1
namedprocess_namegroup_num_procs{groupname="systemd-logind"} 1
namedprocess_namegroup_num_procs{groupname="systemd-timesyncd"} 1
namedprocess_namegroup_num_procs{groupname="systemd-udevd"} 1

commとcmdlineの組み合わせ

config.yaml
process_names:
  - comm:
    - nginx
    cmdline:
    - .worker

curl結果

# TYPE namedprocess_namegroup_num_procs gauge
namedprocess_namegroup_num_procs{groupname="nginx: worker process"} 1

組み合わせを二つ

config.yaml
process_names:
  - comm:
    - nginx
    cmdline:
    - master
  - comm:
    - php-fpm7.0
    cmdline:
    - pool

curl結果

# TYPE namedprocess_namegroup_num_procs gauge
namedprocess_namegroup_num_procs{groupname="nginx -g daemon on; master_process on"} 2
namedprocess_namegroup_num_procs{groupname="php-fpm: pool www"} 2

とりあえずこの設定にする

結局exeの使い方が分からなかったけど、プロセスが存在するかどうかを確認するだけだったら
commとcmdlineの組み合わせで設定していけば事足りそう。
親プロセスと子プロセスで別れてるのを一つにまとめられたらいいんだけど、それももういいや。。

config.yaml
process_names:
  - comm:
    - nginx
    cmdline:
    - master
  - comm:
    - php-fpm7.0
    cmdline:
    - pool

ansibleを修正します。

config.yamlをtemplateで監視するプロセスを変えれるようにしました。

process-exporter.yml
---
- hosts: exsamplehost
  user: exsampleuser
  sudo: yes
  vars:
    proc_names:
      - { name: nginx, regex: master }
  tasks:
  - name: check exist file
    stat:
      path: /usr/bin/process-exporter
    register: file
  - name: wget process-exporter
    get_url:
      url: https://github.com/ncabatoff/process-exporter/releases/download/v0.2.11/process-exporter_0.2.11_linux_amd64.tar.gz
      dest: /tmp/process-exporter_0.2.11_linux_amd64.tar.gz
  - name: unarchive  process-exporter
    unarchive:
      src: /tmp/process-exporter_0.2.11_linux_amd64.tar.gz
      remote_src: yes
      dest: /tmp/
  - name: copy process-exporter binary
    shell: cp /tmp/process-exporter /usr/bin/process-exporter
    when: not file.stat.exists
  - name: create process-exorter dir
    file:
     state: directory
     group: root
     owner: root
     path: /usr/local/process-exporter
     recurse: yes
  - name: create config.yaml
    template:
      src: /data/ansible/template/process-exporter/config.yaml.j2
      dest: /usr/local/process-exporter/config.yaml
      owner: root
      group: root
      mode: 0644
  - name: create process-exporter systemd
    blockinfile:
      path: /etc/systemd/system/process-exporter.service
      create: yes
      block: |
        [Unit]
        Description=process-exporter for Prometheus

        [Service]
        Restart=always
        ExecStart=/usr/bin/process-exporter -config.path=/usr/local/process-exporter/config.yaml
        ExecReload=/bin/kill -HUP $MAINPID
        TimeoutStopSec=20s
        SendSIGKILL=no

        [Install]
        WantedBy=multi-user.target
config.yaml.j2
process_names:
{% for proc_name in proc_names %}
  - comm:
    - {{proc_name.name}}
    cmdline:
    - {{proc_name.regex}}
{% endfor %}

Prometheus側の設定

###Target追加
node_exporterはどのサーバでもデフォルトで入れようと思うので、service discoveryで追加しているけど
プロセス監視は必須ではないので、file_sd_configで追加していく感じにします。
ラベルとかは今のgrafanaだと「instance」というラベルにインスタンス名=ホスト名で設定しているのでそれさえ合わせておけば上手い感じに設定できそう。

prometheus.yml抜粋
  - job_name: "Proccess monitoring"
    file_sd_configs:
      - files:
        - /usr/local/prometheus-2.3.1/filesd/process.yml
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_target]
        regex: '(.*):(.*)'
        target_label: instance
        replacement: '${1}'
process.yml
- targets:
    # 監視したいサーバが名前解決出来るならここに書く
    - ap-prometheus01:9256

# 名前解決できない場合は手動でinstanceのラベルを付ける
#- targets:
#    - 35.200.32.78
#  labels:
#    instance: 
#

grafanaで見てみる

サーバ毎の詳細ダッシュボード

proc.PNG

alert用グラフ

image.png

Prometheus 過去記事
Prometheusのすすめ - blackbox_exporterで死活監視をしよう -
Prometheusのすすめ - Grafanaで監視対象のユーザ一覧が欲しい -
Prometheusのすすめ - 初期導入 -
Prometheusのすすめ - Service Discovery -
Prometheusのすすめ - exporter導入 node-exporter(apt-get) -
Prometheusのすすめ - exporter導入 node-exporter(バイナリ) -
Prometheusのすすめ - Service Discovery - EndPointが「http://:9100/metrics」になって、自分自身になってしまう件

15
11
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
15
11

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?