0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?

おうちサーバー構築報告:作業記録:ProxmoxVEゲストのバックアップに失敗した話

0
Posted at

能書き

おうちサーバー構築報告:予告からのおうちサーバー構築です。

ゲストのバックアップに失敗してしまいました。その記録ですが、今回は自分向け作業メモの意味合いが強いです。

参考文献

下記記事のバックアップを実施しようとして失敗したのでした。

バックアップ

バックアップはstopモードで実施しました。

Proxmoxホストマシン
vzdump --all 1 --storage local --compress gzip --mode snapshot

実行結果の画面はこんな感じになりました。

Proxmoxホストマシン
# vzdump --all 1 --storage local --compress gzip --mode snapshot
INFO: starting new backup job: vzdump --compress gzip --all 1 --storage local --mode snapshot
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_100 for temporary files
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2026-02-07 14:36:52
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: constructor
INFO: including mount point rootfs ('/') in backup
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-100-2026_02_07-14_36_52.tar.gz'
INFO: Total bytes written: 1529436160 (1.5GiB, 40MiB/s)
INFO: archive file size: 335MB
INFO: Finished Backup of VM 100 (00:00:37)
INFO: Backup finished at 2026-02-07 14:37:29
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_101 for temporary files
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2026-02-07 14:37:29
INFO: status = running
INFO: CT Name: network
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-101-2026_02_07-14_37_29.tar.gz'
INFO: Total bytes written: 1033799680 (986MiB, 35MiB/s)
INFO: archive file size: 316MB
INFO: cleanup temporary 'vzdump' snapshot
INFO: Finished Backup of VM 101 (00:00:28)
INFO: Backup finished at 2026-02-07 14:37:57
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_102 for temporary files
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2026-02-07 14:37:57
INFO: status = running
INFO: CT Name: mail
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-102-2026_02_07-14_37_57.tar.gz'
INFO: Total bytes written: 998144000 (952MiB, 35MiB/s)
INFO: archive file size: 311MB
INFO: cleanup temporary 'vzdump' snapshot
INFO: Finished Backup of VM 102 (00:00:28)
INFO: Backup finished at 2026-02-07 14:38:25
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_103 for temporary files
INFO: Starting Backup of VM 103 (lxc)
INFO: Backup started at 2026-02-07 14:38:25
INFO: status = running
INFO: CT Name: proxy
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-103-2026_02_07-14_38_25.tar.gz'
INFO: Total bytes written: 953108480 (909MiB, 34MiB/s)
INFO: archive file size: 305MB
INFO: cleanup temporary 'vzdump' snapshot
INFO: Finished Backup of VM 103 (00:00:28)
INFO: Backup finished at 2026-02-07 14:38:53
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2026-02-07 14:38:53
INFO: status = running
INFO: VM Name: docker
INFO: include disk 'scsi0' 'local-zfs:vm-104-disk-0' 256G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-104-2026_02_07-14_38_53.vma.gz'
INFO: started backup task '1ac3fb8f-c137-4cb9-b652-9e03577af5da'
INFO: resuming VM again
INFO:   0% (246.5 MiB of 256.0 GiB) in 3s, read: 82.2 MiB/s, write: 34.0 MiB/s
INFO:   1% (2.6 GiB of 256.0 GiB) in 28s, read: 95.3 MiB/s, write: 36.6 MiB/s
INFO:   2% (5.2 GiB of 256.0 GiB) in 1m 31s, read: 42.2 MiB/s, write: 39.8 MiB/s
INFO:   3% (7.7 GiB of 256.0 GiB) in 2m 47s, read: 34.1 MiB/s, write: 32.2 MiB/s
INFO:   4% (11.0 GiB of 256.0 GiB) in 3m 11s, read: 140.2 MiB/s, write: 36.3 MiB/s
INFO:   5% (14.3 GiB of 256.0 GiB) in 3m 14s, read: 1.1 GiB/s, write: 46.3 MiB/s
INFO:   5% (15.2 GiB of 256.0 GiB) in 3m 45s, read: 29.9 MiB/s, write: 29.8 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 104 failed - job failed with err -5 - Input/output error
INFO: Failed at 2026-02-07 14:42:38
INFO: Backup job finished with errors
job errors

はい。最後の vmid=104 (QEMU仮想マシン)のバックアップに失敗しました。
こいつだけもう一度試してみます。

Proxmoxホストマシン
vzdump 104 --storage local --compress gzip --mode snapshot

実行結果の画面はこんな感じになりました。

Proxmoxホストマシン
# vzdump 104 --storage local --compress gzip --mode snapshot
INFO: starting new backup job: vzdump 104 --storage local --mode snapshot --compress gzip
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2026-02-07 14:49:15
INFO: status = running
INFO: VM Name: docker
INFO: include disk 'scsi0' 'local-zfs:vm-104-disk-0' 256G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-104-2026_02_07-14_49_15.vma.gz'
INFO: started backup task '205fc172-8faf-4938-9526-66572a178e10'
INFO: resuming VM again
INFO:   0% (232.0 MiB of 256.0 GiB) in 3s, read: 77.3 MiB/s, write: 31.0 MiB/s
INFO:   1% (2.6 GiB of 256.0 GiB) in 29s, read: 93.7 MiB/s, write: 37.1 MiB/s
INFO:   2% (5.2 GiB of 256.0 GiB) in 1m 31s, read: 42.1 MiB/s, write: 39.6 MiB/s
INFO:   3% (7.7 GiB of 256.0 GiB) in 2m 47s, read: 34.2 MiB/s, write: 32.3 MiB/s
INFO:   4% (12.6 GiB of 256.0 GiB) in 3m 12s, read: 200.1 MiB/s, write: 35.1 MiB/s
INFO:   5% (14.3 GiB of 256.0 GiB) in 3m 15s, read: 579.8 MiB/s, write: 49.9 MiB/s
INFO:   5% (15.2 GiB of 256.0 GiB) in 3m 45s, read: 30.1 MiB/s, write: 29.9 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 104 failed - job failed with err -5 - Input/output error
INFO: Failed at 2026-02-07 14:53:00
INFO: Backup job finished with errors
job errors

やはり、同じくらいの進捗でエラーになります。

ここでCopilotに相談してこのログを診てもらった所、ディスクのブロックエラーの可能性を指摘されました。そこで指示に従って情報収集。

Proxmoxホストマシン
# zpool status
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:16 with 0 errors on Sun May 11 00:24:17 2025
config:

        NAME                                                                                                   STATE     READ WRITE CKSUM
        rpool                                                                                                  ONLINE       0     0     0
          nvme-nvme.1e4b-3330303936363830333630-53554e454153542053453930304e564733203230343847-00000001-part3  ONLINE       0     0     4

errors: 1 data errors, use '-v' for a list
Proxmoxホストマシン
# zpool status -v
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 00:00:16 with 0 errors on Sun May 11 00:24:17 2025
config:

        NAME                                                                                                   STATE     READ WRITE CKSUM
        rpool                                                                                                  ONLINE       0     0     0
          nvme-nvme.1e4b-3330303936363830333630-53554e454153542053453930304e564733203230343847-00000001-part3  ONLINE       0     0     4

errors: Permanent errors have been detected in the following files:

        rpool/data/vm-104-disk-0:<0x1>

やはりZFS上にブロックのエラーが発生していて、それが仮想マシン104に当たっていたという事のようです。

そしてZFSがチェックサムエラー(CKSUM)を出すのはディスク劣化の典型症状だという事で、更なる情報収集。

Proxmoxホストマシン
# smartctl -a /dev/nvme0
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-5-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       SUNEAST SE900NVG3 2048G
Serial Number:                      30096680360
Firmware Version:                   SN10514
PCI Vendor/Subsystem ID:            0x1e4b
IEEE OUI Identifier:                0x000000
Total NVM Capacity:                 2,048,408,248,320 [2.04 TB]
Unallocated NVM Capacity:           0
Controller ID:                      0
NVMe Version:                       1.4
Number of Namespaces:               1
Namespace 1 Size/Capacity:          2,048,408,248,320 [2.04 TB]
Namespace 1 Utilization:            103,166,911,488 [103 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            000000 0000000001
Local Time is:                      Sat Feb  7 15:02:41 2026 JST
Firmware Updates (0x1a):            5 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f):     Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02):         Cmd_Eff_Lg
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     90 Celsius
Critical Comp. Temp. Threshold:     95 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     6.50W       -        -    0  0  0  0        0       0
 1 +     5.80W       -        -    1  1  1  1        0       0
 2 +     3.60W       -        -    2  2  2  2        0       0
 3 -   0.0500W       -        -    3  3  3  3     5000   10000
 4 -   0.0025W       -        -    4  4  4  4     8000   45000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        27 Celsius
Available Spare:                    100%
Available Spare Threshold:          1%
Percentage Used:                    2%
Data Units Read:                    54,764,493 [28.0 TB]
Data Units Written:                 134,422,592 [68.8 TB]
Host Read Commands:                 254,426,441
Host Write Commands:                1,641,797,644
Controller Busy Time:               2,673
Power Cycles:                       109
Power On Hours:                     7,163
Unsafe Shutdowns:                   47
Media and Data Integrity Errors:    0
Error Information Log Entries:      248
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               27 Celsius
Temperature Sensor 2:               34 Celsius

Error Information (NVMe Log 0x01, 16 of 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0        248     0  0x6018  0x2002  0x000            0     0     -
  1        247     0  0x001b  0x2002  0x000            0     0     -

この結果を見る限り物理ディスクは壊れていないとの事。

  • Media and Data Integrity Errors: 0
  • Critical Warning: 0x00
  • Temperature 正常
  • Percentage Used: 2%(新品同様)
  • Available Spare: 100%

物理的なNANDの破損や寿命ではない模様。但し「Error Information Log Entries: 248」とあるので、I/Oエラーが発生した事は確かのようです。その原因については、Media Error(物理破損)が0である事から、例えば下記が考えられる由。

  • 一時的なI/Oタイムアウト
  • コントローラの一時不調
  • OS側のI/Oエラー
  • ZFSのメタデータ破損
  • 電源断(Unsafe Shutdowns: 47)による影響

この辺で追及は諦めました。

仕舞い

ここで私が思い出すのはIntelCPUの不安定問題(Vmin shift問題)ですな。

いずれにしろ物理ディスクは壊れてなくて、論理的な破損だけらしい。となれば、どうせまだまともに使っていないので、ここは一つ諦めてやり直した方が幸せになれそうです。

と言う訳で次回はProxmoxVEを再インストールする事にします。バージョンも、昨年夏に上がったようですし。

0
0
0

Register as a new user and use Qiita more conveniently

  1. You get articles that match your needs
  2. You can efficiently read back useful information
  3. You can use dark theme
What you can do with signing up
0
0

Delete article

Deleted articles cannot be recovered.

Draft of this article would be also deleted.

Are you sure you want to delete this article?