能書き
おうちサーバー構築報告:予告からのおうちサーバー構築です。
ゲストのバックアップに失敗してしまいました。その記録ですが、今回は自分向け作業メモの意味合いが強いです。
参考文献
下記記事のバックアップを実施しようとして失敗したのでした。
バックアップ
バックアップはstopモードで実施しました。
vzdump --all 1 --storage local --compress gzip --mode snapshot
実行結果の画面はこんな感じになりました。
# vzdump --all 1 --storage local --compress gzip --mode snapshot
INFO: starting new backup job: vzdump --compress gzip --all 1 --storage local --mode snapshot
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_100 for temporary files
INFO: Starting Backup of VM 100 (lxc)
INFO: Backup started at 2026-02-07 14:36:52
INFO: status = stopped
INFO: backup mode: stop
INFO: ionice priority: 7
INFO: CT Name: constructor
INFO: including mount point rootfs ('/') in backup
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-100-2026_02_07-14_36_52.tar.gz'
INFO: Total bytes written: 1529436160 (1.5GiB, 40MiB/s)
INFO: archive file size: 335MB
INFO: Finished Backup of VM 100 (00:00:37)
INFO: Backup finished at 2026-02-07 14:37:29
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_101 for temporary files
INFO: Starting Backup of VM 101 (lxc)
INFO: Backup started at 2026-02-07 14:37:29
INFO: status = running
INFO: CT Name: network
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-101-2026_02_07-14_37_29.tar.gz'
INFO: Total bytes written: 1033799680 (986MiB, 35MiB/s)
INFO: archive file size: 316MB
INFO: cleanup temporary 'vzdump' snapshot
INFO: Finished Backup of VM 101 (00:00:28)
INFO: Backup finished at 2026-02-07 14:37:57
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_102 for temporary files
INFO: Starting Backup of VM 102 (lxc)
INFO: Backup started at 2026-02-07 14:37:57
INFO: status = running
INFO: CT Name: mail
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-102-2026_02_07-14_37_57.tar.gz'
INFO: Total bytes written: 998144000 (952MiB, 35MiB/s)
INFO: archive file size: 311MB
INFO: cleanup temporary 'vzdump' snapshot
INFO: Finished Backup of VM 102 (00:00:28)
INFO: Backup finished at 2026-02-07 14:38:25
INFO: filesystem type on dumpdir is 'zfs' -using /var/tmp/vzdumptmp1465850_103 for temporary files
INFO: Starting Backup of VM 103 (lxc)
INFO: Backup started at 2026-02-07 14:38:25
INFO: status = running
INFO: CT Name: proxy
INFO: including mount point rootfs ('/') in backup
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: create storage snapshot 'vzdump'
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-lxc-103-2026_02_07-14_38_25.tar.gz'
INFO: Total bytes written: 953108480 (909MiB, 34MiB/s)
INFO: archive file size: 305MB
INFO: cleanup temporary 'vzdump' snapshot
INFO: Finished Backup of VM 103 (00:00:28)
INFO: Backup finished at 2026-02-07 14:38:53
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2026-02-07 14:38:53
INFO: status = running
INFO: VM Name: docker
INFO: include disk 'scsi0' 'local-zfs:vm-104-disk-0' 256G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-104-2026_02_07-14_38_53.vma.gz'
INFO: started backup task '1ac3fb8f-c137-4cb9-b652-9e03577af5da'
INFO: resuming VM again
INFO: 0% (246.5 MiB of 256.0 GiB) in 3s, read: 82.2 MiB/s, write: 34.0 MiB/s
INFO: 1% (2.6 GiB of 256.0 GiB) in 28s, read: 95.3 MiB/s, write: 36.6 MiB/s
INFO: 2% (5.2 GiB of 256.0 GiB) in 1m 31s, read: 42.2 MiB/s, write: 39.8 MiB/s
INFO: 3% (7.7 GiB of 256.0 GiB) in 2m 47s, read: 34.1 MiB/s, write: 32.2 MiB/s
INFO: 4% (11.0 GiB of 256.0 GiB) in 3m 11s, read: 140.2 MiB/s, write: 36.3 MiB/s
INFO: 5% (14.3 GiB of 256.0 GiB) in 3m 14s, read: 1.1 GiB/s, write: 46.3 MiB/s
INFO: 5% (15.2 GiB of 256.0 GiB) in 3m 45s, read: 29.9 MiB/s, write: 29.8 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 104 failed - job failed with err -5 - Input/output error
INFO: Failed at 2026-02-07 14:42:38
INFO: Backup job finished with errors
job errors
はい。最後の vmid=104 (QEMU仮想マシン)のバックアップに失敗しました。
こいつだけもう一度試してみます。
vzdump 104 --storage local --compress gzip --mode snapshot
実行結果の画面はこんな感じになりました。
# vzdump 104 --storage local --compress gzip --mode snapshot
INFO: starting new backup job: vzdump 104 --storage local --mode snapshot --compress gzip
INFO: Starting Backup of VM 104 (qemu)
INFO: Backup started at 2026-02-07 14:49:15
INFO: status = running
INFO: VM Name: docker
INFO: include disk 'scsi0' 'local-zfs:vm-104-disk-0' 256G
INFO: backup mode: snapshot
INFO: ionice priority: 7
INFO: creating vzdump archive '/var/lib/vz/dump/vzdump-qemu-104-2026_02_07-14_49_15.vma.gz'
INFO: started backup task '205fc172-8faf-4938-9526-66572a178e10'
INFO: resuming VM again
INFO: 0% (232.0 MiB of 256.0 GiB) in 3s, read: 77.3 MiB/s, write: 31.0 MiB/s
INFO: 1% (2.6 GiB of 256.0 GiB) in 29s, read: 93.7 MiB/s, write: 37.1 MiB/s
INFO: 2% (5.2 GiB of 256.0 GiB) in 1m 31s, read: 42.1 MiB/s, write: 39.6 MiB/s
INFO: 3% (7.7 GiB of 256.0 GiB) in 2m 47s, read: 34.2 MiB/s, write: 32.3 MiB/s
INFO: 4% (12.6 GiB of 256.0 GiB) in 3m 12s, read: 200.1 MiB/s, write: 35.1 MiB/s
INFO: 5% (14.3 GiB of 256.0 GiB) in 3m 15s, read: 579.8 MiB/s, write: 49.9 MiB/s
INFO: 5% (15.2 GiB of 256.0 GiB) in 3m 45s, read: 30.1 MiB/s, write: 29.9 MiB/s
ERROR: job failed with err -5 - Input/output error
INFO: aborting backup job
INFO: resuming VM again
ERROR: Backup of VM 104 failed - job failed with err -5 - Input/output error
INFO: Failed at 2026-02-07 14:53:00
INFO: Backup job finished with errors
job errors
やはり、同じくらいの進捗でエラーになります。
ここでCopilotに相談してこのログを診てもらった所、ディスクのブロックエラーの可能性を指摘されました。そこで指示に従って情報収集。
# zpool status
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:16 with 0 errors on Sun May 11 00:24:17 2025
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
nvme-nvme.1e4b-3330303936363830333630-53554e454153542053453930304e564733203230343847-00000001-part3 ONLINE 0 0 4
errors: 1 data errors, use '-v' for a list
# zpool status -v
pool: rpool
state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption. Applications may be affected.
action: Restore the file in question if possible. Otherwise restore the
entire pool from backup.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-8A
scan: scrub repaired 0B in 00:00:16 with 0 errors on Sun May 11 00:24:17 2025
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
nvme-nvme.1e4b-3330303936363830333630-53554e454153542053453930304e564733203230343847-00000001-part3 ONLINE 0 0 4
errors: Permanent errors have been detected in the following files:
rpool/data/vm-104-disk-0:<0x1>
やはりZFS上にブロックのエラーが発生していて、それが仮想マシン104に当たっていたという事のようです。
そしてZFSがチェックサムエラー(CKSUM)を出すのはディスク劣化の典型症状だという事で、更なる情報収集。
# smartctl -a /dev/nvme0
smartctl 7.3 2022-02-28 r5338 [x86_64-linux-6.8.12-5-pve] (local build)
Copyright (C) 2002-22, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Number: SUNEAST SE900NVG3 2048G
Serial Number: 30096680360
Firmware Version: SN10514
PCI Vendor/Subsystem ID: 0x1e4b
IEEE OUI Identifier: 0x000000
Total NVM Capacity: 2,048,408,248,320 [2.04 TB]
Unallocated NVM Capacity: 0
Controller ID: 0
NVMe Version: 1.4
Number of Namespaces: 1
Namespace 1 Size/Capacity: 2,048,408,248,320 [2.04 TB]
Namespace 1 Utilization: 103,166,911,488 [103 GB]
Namespace 1 Formatted LBA Size: 512
Namespace 1 IEEE EUI-64: 000000 0000000001
Local Time is: Sat Feb 7 15:02:41 2026 JST
Firmware Updates (0x1a): 5 Slots, no Reset required
Optional Admin Commands (0x0017): Security Format Frmw_DL Self_Test
Optional NVM Commands (0x001f): Comp Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat
Log Page Attributes (0x02): Cmd_Eff_Lg
Maximum Data Transfer Size: 128 Pages
Warning Comp. Temp. Threshold: 90 Celsius
Critical Comp. Temp. Threshold: 95 Celsius
Supported Power States
St Op Max Active Idle RL RT WL WT Ent_Lat Ex_Lat
0 + 6.50W - - 0 0 0 0 0 0
1 + 5.80W - - 1 1 1 1 0 0
2 + 3.60W - - 2 2 2 2 0 0
3 - 0.0500W - - 3 3 3 3 5000 10000
4 - 0.0025W - - 4 4 4 4 8000 45000
Supported LBA Sizes (NSID 0x1)
Id Fmt Data Metadt Rel_Perf
0 + 512 0 0
=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED
SMART/Health Information (NVMe Log 0x02)
Critical Warning: 0x00
Temperature: 27 Celsius
Available Spare: 100%
Available Spare Threshold: 1%
Percentage Used: 2%
Data Units Read: 54,764,493 [28.0 TB]
Data Units Written: 134,422,592 [68.8 TB]
Host Read Commands: 254,426,441
Host Write Commands: 1,641,797,644
Controller Busy Time: 2,673
Power Cycles: 109
Power On Hours: 7,163
Unsafe Shutdowns: 47
Media and Data Integrity Errors: 0
Error Information Log Entries: 248
Warning Comp. Temperature Time: 0
Critical Comp. Temperature Time: 0
Temperature Sensor 1: 27 Celsius
Temperature Sensor 2: 34 Celsius
Error Information (NVMe Log 0x01, 16 of 64 entries)
Num ErrCount SQId CmdId Status PELoc LBA NSID VS
0 248 0 0x6018 0x2002 0x000 0 0 -
1 247 0 0x001b 0x2002 0x000 0 0 -
この結果を見る限り物理ディスクは壊れていないとの事。
- Media and Data Integrity Errors: 0
- Critical Warning: 0x00
- Temperature 正常
- Percentage Used: 2%(新品同様)
- Available Spare: 100%
物理的なNANDの破損や寿命ではない模様。但し「Error Information Log Entries: 248」とあるので、I/Oエラーが発生した事は確かのようです。その原因については、Media Error(物理破損)が0である事から、例えば下記が考えられる由。
- 一時的なI/Oタイムアウト
- コントローラの一時不調
- OS側のI/Oエラー
- ZFSのメタデータ破損
- 電源断(Unsafe Shutdowns: 47)による影響
この辺で追及は諦めました。
仕舞い
ここで私が思い出すのはIntelCPUの不安定問題(Vmin shift問題)ですな。
いずれにしろ物理ディスクは壊れてなくて、論理的な破損だけらしい。となれば、どうせまだまともに使っていないので、ここは一つ諦めてやり直した方が幸せになれそうです。
と言う訳で次回はProxmoxVEを再インストールする事にします。バージョンも、昨年夏に上がったようですし。