Saturday, March 20, 2010

Replacing a disk in a Sun D2 Array

The Sun D2 Array is hot swap capable. As we control the D2s using Disksuite, a.k.a. SVM, it is possible to replace a disk in a D2 without requiring that the server be shutdown.

When you receive notification that a disk has died. First run metastat to determine with disk has been swapped out
# metastat
d0: Mirror
    Submirror 0: d10
      State: Okay        
    Submirror 1: d20
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 18876726 blocks


d10: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s0                   0     No    Okay        




d20: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s0                   0     No    Okay        




d1: Mirror
    Submirror 0: d11
      State: Okay        
    Submirror 1: d21
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 33555735 blocks


d11: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s1                   0     No    Okay        




d21: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s1                   0     No    Okay        




d3: Mirror
    Submirror 0: d30
      State: Okay        
    Submirror 1: d31
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 355604121 blocks


d30: Submirror of d3
    State: Okay        
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t0d0s0                   0     No    Okay        
        c2t1d0s0                2889     No    Okay        
        c2t2d0s0                2889     No    Okay        
        c2t3d0s0                2889     No    Okay        
        c2t4d0s0                2889     No    Okay        




d31: Submirror of d3
    State: Okay        
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t5d0s0                   0     No    Okay         c2t12d0s0
        c2t8d0s0                2889     No    Okay        
        c2t9d0s0                2889     No    Okay        
        c2t10d0s0               2889     No    Okay        
        c2t11d0s0               2889     No    Okay        




hsp000: 2 hot spares
        c2t12d0s0               In use          71124291 blocks
        c2t13d0s0               Available       71124291 blocks


#

OK, c2t5d0s0 is the sixth disk from the left in the array - or to put that another way it is the left one of the two middle disks!

You can run format->analyse->read which will determine whether the disk is really dead.
# format
Searching for disks...done




AVAILABLE DISK SELECTIONS:
       0. c1t0d0
          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde452,0
       1. c1t1d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde367,0
       2. c2t0d0

          /pci@8,600000/pci@1/scsi@4/sd@0,0
       3. c2t1d0

          /pci@8,600000/pci@1/scsi@4/sd@1,0
       4. c2t2d0

          /pci@8,600000/pci@1/scsi@4/sd@2,0
       5. c2t3d0

          /pci@8,600000/pci@1/scsi@4/sd@3,0
       6. c2t4d0

          /pci@8,600000/pci@1/scsi@4/sd@4,0
       7. c2t5d0

          /pci@8,600000/pci@1/scsi@4/sd@5,0
       8. c2t8d0

          /pci@8,600000/pci@1/scsi@4/sd@8,0
       9. c2t9d0

          /pci@8,600000/pci@1/scsi@4/sd@9,0
      10. c2t10d0

          /pci@8,600000/pci@1/scsi@4/sd@a,0
      11. c2t11d0

          /pci@8,600000/pci@1/scsi@4/sd@b,0
      12. c2t12d0

          /pci@8,600000/pci@1/scsi@4/sd@c,0
      13. c2t13d0

          /pci@8,600000/pci@1/scsi@4/sd@d,0
Specify disk (enter its number): 7
selecting c2t5d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> anal


ANALYZE MENU:
        read     - read only test   (doesn't harm SunOS)
        refresh  - read then write  (doesn't harm data)
        test     - pattern testing  (doesn't harm data)
        write    - write then read      (corrupts data)
        compare  - write, read, compare (corrupts data)
        purge    - write, read, write   (corrupts data)
        verify   - write entire disk, then verify (corrupts data)
        print    - display data buffer
        setup    - set analysis parameters
        config   - show analysis parameters
        !
   - execute , then return
        quit
analyze> read
Ready to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y

        pass 0
   24619/26/53 

        pass 1
   24619/26/53 

Total of 0 defective blocks repaired.
analyze> q


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> q
#

The excerpt above didn't show any repairs, but they may be some when you run the command.



Raise a call for the new disk under your support contract. Or failing that search the server room for a compatible disk - Good Luck!

As long as a hot spare has jumped in, the dead disk can just be removed from the array and the new one inserted. The disk will spin up immediately.

Wait for the green light to come on. And Bob's your uncle!

Run format to apply a disk label. You won't be able to format the disk until you do.
# format
Searching for disks...done

c2t5d0: configured with capacity of 33.92GB


AVAILABLE DISK SELECTIONS:
       0. c1t0d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde452,0
       1. c1t1d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde367,0
       2. c2t0d0

          /pci@8,600000/pci@1/scsi@4/sd@0,0
       3. c2t1d0

          /pci@8,600000/pci@1/scsi@4/sd@1,0
       4. c2t2d0

          /pci@8,600000/pci@1/scsi@4/sd@2,0
       5. c2t3d0

          /pci@8,600000/pci@1/scsi@4/sd@3,0
       6. c2t4d0

          /pci@8,600000/pci@1/scsi@4/sd@4,0
       7. c2t5d0

          /pci@8,600000/pci@1/scsi@4/sd@5,0
       8. c2t8d0

          /pci@8,600000/pci@1/scsi@4/sd@8,0
       9. c2t9d0

          /pci@8,600000/pci@1/scsi@4/sd@9,0
      10. c2t10d0

          /pci@8,600000/pci@1/scsi@4/sd@a,0
      11. c2t11d0

          /pci@8,600000/pci@1/scsi@4/sd@b,0
      12. c2t12d0

          /pci@8,600000/pci@1/scsi@4/sd@c,0
      13. c2t13d0

          /pci@8,600000/pci@1/scsi@4/sd@d,0
Specify disk (enter its number): 7
selecting c2t5d0
[disk formatted]
Disk not labeled.  Label it now? y


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> q
#


Enter a prtvtoc/fmthard command combination to ensure the disk has the same slices as the replaced disk. In the command below I use the disk that is in the equivalent position on the other side of the mirror as the source of the configuration.
# prtvtoc /dev/rdsk/c2t0d0s0 | fmthard -s - /dev/rdsk/c2t5d0s0
fmthard:  New volume table of contents now in place.
#



Check the disk format:
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde452,0
       1. c1t1d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde367,0
       2. c2t0d0

          /pci@8,600000/pci@1/scsi@4/sd@0,0
       3. c2t1d0

          /pci@8,600000/pci@1/scsi@4/sd@1,0
       4. c2t2d0

          /pci@8,600000/pci@1/scsi@4/sd@2,0
       5. c2t3d0

          /pci@8,600000/pci@1/scsi@4/sd@3,0
       6. c2t4d0

          /pci@8,600000/pci@1/scsi@4/sd@4,0
       7. c2t5d0

          /pci@8,600000/pci@1/scsi@4/sd@5,0
       8. c2t8d0

          /pci@8,600000/pci@1/scsi@4/sd@8,0
       9. c2t9d0

          /pci@8,600000/pci@1/scsi@4/sd@9,0
      10. c2t10d0

          /pci@8,600000/pci@1/scsi@4/sd@a,0
      11. c2t11d0

          /pci@8,600000/pci@1/scsi@4/sd@b,0
      12. c2t12d0

          /pci@8,600000/pci@1/scsi@4/sd@c,0
      13. c2t13d0

          /pci@8,600000/pci@1/scsi@4/sd@d,0
Specify disk (enter its number): 7
selecting c2t5d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> p


PARTITION MENU:
        0      - change `0' partition
        1      - change `1' partition
        2      - change `2' partition
        3      - change `3' partition
        4      - change `4' partition
        5      - change `5' partition
        6      - change `6' partition
        7      - change `7' partition
        select - select a predefined table
        modify - modify a predefined partition table
        name   - name the current table
        print  - display the current table
        label  - write partition map and label to the disk
        !
- execute , then return
        quit
partition> p
Current partition table (original):
Total disk cylinders available: 24620 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       0 - 24618       33.91GB    (24619/0/0) 71124291
  1 unassigned    wu       0                0         (0/0/0)            0
  2     backup    wu       0 - 24619       33.92GB    (24620/0/0) 71127180
  3 unassigned    wu       0                0         (0/0/0)            0
  4 unassigned    wu       0                0         (0/0/0)            0
  5 unassigned    wu       0                0         (0/0/0)            0
  6 unassigned    wu       0                0         (0/0/0)            0
  7 unassigned    wm   24619 - 24619        1.41MB    (1/0/0)         2889

partition> q


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> q

#


Replace the hotspare disk with the original - the replacement command works on the mirror not the submirror which actually has the failed disk!:
# metareplace -e d3 c2t5d0s0
#


Use metastat | grep to check when the mirror has finished resync-ing
#  metareplace -e d3 c2t5d0s0
d3: device c2t5d0s0 is enabled
# metastat
d0: Mirror
    Submirror 0: d10
      State: Okay        
    Submirror 1: d20
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 18876726 blocks

d10: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s0                   0     No    Okay        


d20: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s0                   0     No    Okay        


d1: Mirror
    Submirror 0: d11
      State: Okay        
    Submirror 1: d21
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 33555735 blocks

d11: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s1                   0     No    Okay        


d21: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s1                   0     No    Okay        


d3: Mirror
    Submirror 0: d30
      State: Okay        
    Submirror 1: d31
      State: Resyncing   
    Resync in progress: 0 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 355604121 blocks

d30: Submirror of d3
    State: Okay        
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t0d0s0                   0     No    Okay        
        c2t1d0s0                2889     No    Okay        
        c2t2d0s0                2889     No    Okay        
        c2t3d0s0                2889     No    Okay        
        c2t4d0s0                2889     No    Okay        


d31: Submirror of d3
    State: Resyncing   
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t5d0s0                   0     No    Resyncing   
        c2t8d0s0                2889     No    Okay        
        c2t9d0s0                2889     No    Okay        
        c2t10d0s0               2889     No    Okay        
        c2t11d0s0               2889     No    Okay        


hsp000: 2 hot spares
        c2t12d0s0               Available       71124291 blocks
        c2t13d0s0               Available       71124291 blocks

# metastat d3 | grep "Resync in progress"
    Resync in progress: 5 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 5 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 6 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 6 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 8 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 51 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 60 % done
#



Just repeat that last metastat command until you get no output and the resync-ing will have completed.


And that's that!

No comments: