Hard Disk Replacement

This document describes how to replace a hard disk that has bad sectors on a Proxmox server.

Generally when a hard disk encounters a problem of bad sectors, the administrator will receive a notification email.

The following warning/error was logged by the smartd daemon:

Device: /dev/sdd [SAT], 3 Currently unreadable (pending) sectors

Here the device /dev/sdd has 3 bad sectors.

  • Run the smart test

We can run the smartctl command to run a self test on the damaged device

 smartctl -l selftest /dev/sdd

To view the test result, run this command

 smartctl -a /dev/sdd
  • Get the serial number

Note down the serial number of the disk, as we will use it to identify the device.

We can also find the error message in the log file /var/log/syslog

 grep sdd /var/log/syslog
  • Shutdown all VM
  • Shutdown Proxmox server
  • Proceed to replace the disk with a new one.
  • Power up the server
  • On the terminal run this command to check the disk pool
 sudo zpool status dpool

The state will be in DEGRADED mode.

  • Replace the disk
 zpool replace -f <pool> <old zfs partition> <new zfs partition>

As the new hard disk will take the same name as the old one, so we will just run this command

 zpool replace -f dpool /dev/sdd /dev/sdd

The new partition will be resilvered.

  • Check if the disk is being rebuilt
# zpool status dpool
  pool: dpool
 state: DEGRADED
status: One or more devices is currently being resilvered.  The pool will
	continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
  scan: resilver in progress since Wed Jul  6 11:23:10 2022
	440G scanned at 50.6M/s, 356G issued at 40.9M/s, 1.03T total
	360G resilvered, 33.70% done, 04:52:11 to go

	NAME                       STATE     READ WRITE CKSUM
	dpool                      DEGRADED     0     0     0
	  mirror-0                 DEGRADED     0     0     0
	    sdc                    ONLINE       0     0     0
	    replacing-1            DEGRADED     0     0     0
	      5251453276787027011  UNAVAIL      0     0     0  was /dev/sdd1/old
	      sdd                  ONLINE       0     0     0  (resilvering)

errors: No known data errors