Hard Disk Replacement
This document describes how to replace a hard disk that has bad sectors on a Proxmox server.
Generally when a hard disk encounters a problem of bad sectors, the administrator will receive a notification email.
host name: hostname DNS domain: infosec.unamur.be The following warning/error was logged by the smartd daemon: Device: /dev/sdd [SAT], 3 Currently unreadable (pending) sectors Device info: XYZ XYZXYZXYZX-XYZXYZX, S/N:XYZXYZXY, XYZ:1-123456-789999999, FW:01.01M03, 4.00 TB
Here the device /dev/sdd has 3 bad sectors.
- Run the smart test
We can run the smartctl command to run a self test on the damaged device
smartctl -l selftest /dev/sdd
To view the test result, run this command
smartctl -a /dev/sdd
- Get the serial number
Note down the serial number of the disk, as we will use it to identify the device.
We can also find the error message in the log file /var/log/syslog
grep sdd /var/log/syslog
- Shutdown all VM
- Shutdown Proxmox server
- Proceed to replace the disk with a new one.
- Power up the server
- On the terminal run this command to check the disk pool
sudo zpool status dpool
The state will be in DEGRADED mode.
- Replace the disk
zpool replace -f <pool> <old zfs partition> <new zfs partition>
As the new hard disk will take the same name as the old one, so we will just run this command
zpool replace -f dpool /dev/sdd /dev/sdd
The new partition will be resilvered.
- Check if the disk is being rebuilt
# zpool status dpool pool: dpool state: DEGRADED status: One or more devices is currently being resilvered. The pool will continue to function, possibly in a degraded state. action: Wait for the resilver to complete. scan: resilver in progress since Wed Jul 6 11:23:10 2022 440G scanned at 50.6M/s, 356G issued at 40.9M/s, 1.03T total 360G resilvered, 33.70% done, 04:52:11 to go config: NAME STATE READ WRITE CKSUM dpool DEGRADED 0 0 0 mirror-0 DEGRADED 0 0 0 sdc ONLINE 0 0 0 replacing-1 DEGRADED 0 0 0 5251453276787027011 UNAVAIL 0 0 0 was /dev/sdd1/old sdd ONLINE 0 0 0 (resilvering) errors: No known data errors