Munin
温度センサー
nvme

muninデフォルトのhddtemp_smartctlでNVMe SSDの温度を取得する

概要

現在の環境(Linux, munin 2.0.34, smartmontools 6.6)ではplugin-conf.dの設定だけだと対応できなかったのでメモ

手順

ドライブ一覧の強制

env.drivesを指定しない場合のドライブ一覧は、

hddtemp_smartctl
  my @drivesSCSI;
  if (-d '/sys/block/') {
    opendir(SCSI, '/sys/block/');
    @drivesSCSI = grep /sd[a-z]/, readdir SCSI;
    closedir(SCSI);
   }

というように取得されている(Linuxの場合)。当然、nvme0n1などは該当しない。
そこでenv.drivesを設定することで対応する。

/etc/munin/plugin-conf.d/munin-node
[hddtemp_smartctl]
user root
group disk
env.drives sda sdb nvme0n1

温度行マッチ条件への対応

smartctlの出力から温度を示している行を抜き出す処理は、

hddtemp_smartctl
  if ($output =~ /Current Drive Temperature:\s*(\d+)/) {
    print "$drive.value $1\n";
  } elsif ($output =~ /^(194 Temperature_Celsius.*)/m) {
    my @F = split /\s+/, $1;
    print "$drive.value $F[9]\n";
  } elsif ($output =~ /^(231 Temperature_Celsius.*)/m) {
    my @F = split ' ', $1;
    print "$drive.value $F[9]\n";
  } elsif ($output =~ /^(190 Airflow_Temperature_Cel.*)/m) {
    my @F = split ' ', $1;
    print "$drive.value $F[9]\n";
  } else {
      print "$drive.value U\n";
      print "$drive.extinfo Temperature not detected in smartctl output\n";
  }

のようになっている。
一方、smartctlでNVMe SSDのS.M.A.R.T.情報を取得すると、

$sudo smartctl -A /dev/nvme0n1
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.13.12-gentoo] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF SMART DATA SECTION ===
SMART/Health Information (NVMe Log 0x02, NSID 0x1)
Critical Warning:                   0x00
Temperature:                        46 Celsius
Available Spare:                    100%
Available Spare Threshold:          10%
Percentage Used:                    0%
Data Units Read:                    1,941,482 [994 GB]
Data Units Written:                 2,372,659 [1.21 TB]
Host Read Commands:                 39,349,367
Host Write Commands:                48,090,812
Controller Busy Time:               341
Power Cycles:                       154
Power On Hours:                     16,814
Unsafe Shutdowns:                   52
Media and Data Integrity Errors:    0
Error Information Log Entries:      39

のような出力が得られる。"^Temperature:"の行を少し加工すれば第1条件にマッチしそうだ。
上記の出力フォーマットはsmartmontoolsのnvmeprint.cppで

nvmeprint.cpp
  pout("SMART/Health Information (NVMe Log 0x02, NSID 0x%x)\n", nsid);
  pout("Critical Warning:                   0x%02x\n", smart_log.critical_warning);
  pout("Temperature:                        %s\n",
       kelvin_to_str(buf, le16_to_uint(smart_log.temperature)));
  pout("Available Spare:                    %u%%\n", smart_log.avail_spare);
  pout("Available Spare Threshold:          %u%%\n", smart_log.spare_thresh);
  pout("Percentage Used:                    %u%%\n", smart_log.percent_used);
  pout("Data Units Read:                    %s\n", le128_to_str(buf, smart_log.data_units_read, 1000*512));
  pout("Data Units Written:                 %s\n", le128_to_str(buf, smart_log.data_units_written, 1000*512));
  pout("Host Read Commands:                 %s\n", le128_to_str(buf, smart_log.host_reads));
  pout("Host Write Commands:                %s\n", le128_to_str(buf, smart_log.host_writes));
  pout("Controller Busy Time:               %s\n", le128_to_str(buf, smart_log.ctrl_busy_time));
  pout("Power Cycles:                       %s\n", le128_to_str(buf, smart_log.power_cycles));
  pout("Power On Hours:                     %s\n", le128_to_str(buf, smart_log.power_on_hours));
  pout("Unsafe Shutdowns:                   %s\n", le128_to_str(buf, smart_log.unsafe_shutdowns));
  pout("Media and Data Integrity Errors:    %s\n", le128_to_str(buf, smart_log.media_errors));
  pout("Error Information Log Entries:      %s\n", le128_to_str(buf, smart_log.num_err_log_entries));

と定義されているので、ここを改変すればよさそうだが、とりあえずはsedでアドホックに対応する。

/usr/local/bin/smartctl
#!/bin/sh
/usr/sbin/smartctl "$@" |sed '/^Temperature:/s/^/Current Drive /'

のようなシェルスクリプトを用意し、env.smartctlを設定する。

/etc/munin/plugin-conf.d/munin-node
[hddtemp_smartctl]
user root
group disk
env.smartctl /usr/local/bin/smartctl
env.drives sda sdb nvme0n1

munin-runで確認すると、nvme0n1の値が得られている。

$sudo munin-run hddtemp_smartctl
nvme0n1.value 46
sda.value 32
sdb.value 35

その他の解決策

  1. muninのhddtemp_smartctlを改変する
  2. smartmontoolsのsmartctlを改変する
  3. udevのNVMeデバイスに対する命名規則を改変する

2.なら以下のようなパッチで対応できる。

--- a/nvmeprint.cpp
+++ b/nvmeprint.cpp
@@ -297,7 +297,7 @@ static void print_smart_log(const nvme_smart_log & smart_log, unsigned nsid,
   char buf[64];
   pout("SMART/Health Information (NVMe Log 0x02, NSID 0x%x)\n", nsid);
   pout("Critical Warning:                   0x%02x\n", smart_log.critical_warning);
-  pout("Temperature:                        %s\n",
+  pout("Current Drive Temperature:          %s\n",
        kelvin_to_str(buf, le16_to_uint(smart_log.temperature)));
   pout("Available Spare:                    %u%%\n", smart_log.avail_spare);
   pout("Available Spare Threshold:          %u%%\n", smart_log.spare_thresh);

環境

$munin-run --version
Version:
    This is munin-run (munin-node) v2.0.34

    $Id$

Copyright:
    Copyright (C) 2002-2009 Audun Ytterdal, Jimmy Olsen, Tore Anderson,
    Nicolai Langfeldt / Linpro AS.

    This is free software; see the source for copying conditions. There is
    NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
    PURPOSE.

    This program is released under the GNU General Public License


$smartctl --version
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.13.12-gentoo] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

smartctl comes with ABSOLUTELY NO WARRANTY. This is free
software, and you are welcome to redistribute it under
the terms of the GNU General Public License; either
version 2, or (at your option) any later version.
See http://www.gnu.org for further details.

smartmontools release 6.6 dated 2017-11-05 at 15:20:58 UTC
smartmontools SVN rev 4594 dated 2017-11-05 at 15:21:35
smartmontools build host: x86_64-pc-linux-gnu
smartmontools build with: C++14, GCC 7.2.0
smartmontools configure arguments: '--prefix=/usr' '--build=x86_64-pc-linux-gnu' '--host=x86_64-pc-linux-gnu' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--datadir=/usr/share' '--sysconfdir=/etc' '--localstatedir=/var/lib' '--disable-dependency-tracking' '--disable-silent-rules' '--htmldir=/usr/share/doc/smartmontools-6.6/html' '--libdir=/usr/lib64' '--docdir=/usr/share/doc/smartmontools-6.6' '--with-drivedbdir=/var/db/smartmontools' '--with-initscriptdir=/etc/init.d' '--with-libcap-ng' '--without-selinux' '--with-systemdsystemunitdir=/lib/systemd/system' '--without-gnupg' '--without-update-smart-drivedb' 'build_alias=x86_64-pc-linux-gnu' 'host_alias=x86_64-pc-linux-gnu' 'CXXFLAGS=-O2 -march=native -pipe' 'LDFLAGS=-Wl,-O1 -Wl,--as-needed' 'CFLAGS=-O2 -march=native -pipe' 'PKG_CONFIG_PATH=/usr/lib64/pkgconfig'

$uname -o
GNU/Linux