Monday, March 22, 2010

Replacing the hardware of the NIS+ master

The file system on the NIS+ master had become corrupt.

Not the disks. The root fs and swap were both mirrored with disksuite. And metastat reported everything to be in order!

But files and directories all over the shop were "missing" - actually I/O error was reported on the command line -and a quick look in the messages file revealed that the ugly truth.

There are a couple of good sites out on the internet which describe how to recover from this sort of problem:
SUN
Solaris FAQ

Luckily, I have a script which runs several times a day which copies off the NIS_COLD_START, passwd, shadow and .rootkey files from /etc and executes a nisbackup -a command and tar the whole lot up into a file on a file server. The frequency with which this script is run depends upon the TTL of the domain. My domain still has the default of 12 hours, so in theory the script only needs to run twice a day. But 3 or 4 times would be better. I'm not sure it is worthwhile keeping all of these, but the last couple, maybe.

As luck would have it, The server crashed running this script. Just after writing the tarfile to the fileserver. The script also tests the validity of the backup by extracting all the files to a temporary directory. I guess the corrupt filesystem just decided it couldn't handle that.

I grabbed hold of an old Sun Blade 150 that was in the store cupboard, for just such an eventuality and changed its identity to be the same as the failed NIS+ master. I changed
/etc/hosts
/etc/hostname.eri0
/etc/nodename
/etc/net/ticlts/hosts
/etc/net/ticots/hosts
/etc/net/ticotsord/hosts
and entered hostname

I ftp-ed my tarfile backup into /tmp and untar-ed it.

I copied the passwd, shadow and .rootkey files into /etc overwriting any existing files.

And then I entered nisrestore -f -a /tmp/nisplus_backup ignoring any output.

And then I shutdown the server, moved it into the racks and restarted the server. 

As a test on the new server when it was back up, I ran nisping -C -a and also ran the script which backs up all the data.

The backup command failed!

Aarrgghh!

Luckily the problem was clear from the messages file. The directory you tell nisbackup to write to must exist before you enter the command.

Phew!

And that's that!

Sunday, March 21, 2010

Symantec EndPoint Protection

My company uses Symantec Endpoint Protection on all the windows servers. I've known for some time that there was a Linux client, but over the last week Nessus security scans were run against both some really old legacy Solaris servers, the Linux servers and also against the windows servers.

Now the Windows servers were protected by EndPoint and received a clean bill of health.

The Linux servers all have iptables firewalls and SELinux in enforcing mode, and so generated a few false positives, but were generally clean. The worst was that a few web servers hadn't had the TraceEnable Off parameter added to their configuration.

The Solaris servers fared worse. Simply due to their age and the fact that their purpose had been in a development environment.

The thing about EndPoint which I hadn't previously realised was that it detected attempted intrusions and refused further connections from those hosts originating the attacks. In this way it seemed to be operating much much like one of the modes that it was possible to configure into PortSentry. (It is really surprpising to think that the last release of  PortSentry is almost seven years old now!) Consequently, I began lobbying for additional budget to purchase licences for the additional platforms.

The ability to have a single "management station" control the security protection across heterogenous server environment is incredible.

That's that for now!

Saturday, March 20, 2010

Replacing a disk in a Sun D2 Array

The Sun D2 Array is hot swap capable. As we control the D2s using Disksuite, a.k.a. SVM, it is possible to replace a disk in a D2 without requiring that the server be shutdown.

When you receive notification that a disk has died. First run metastat to determine with disk has been swapped out
# metastat
d0: Mirror
    Submirror 0: d10
      State: Okay        
    Submirror 1: d20
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 18876726 blocks


d10: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s0                   0     No    Okay        




d20: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s0                   0     No    Okay        




d1: Mirror
    Submirror 0: d11
      State: Okay        
    Submirror 1: d21
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 33555735 blocks


d11: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s1                   0     No    Okay        




d21: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s1                   0     No    Okay        




d3: Mirror
    Submirror 0: d30
      State: Okay        
    Submirror 1: d31
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 355604121 blocks


d30: Submirror of d3
    State: Okay        
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t0d0s0                   0     No    Okay        
        c2t1d0s0                2889     No    Okay        
        c2t2d0s0                2889     No    Okay        
        c2t3d0s0                2889     No    Okay        
        c2t4d0s0                2889     No    Okay        




d31: Submirror of d3
    State: Okay        
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t5d0s0                   0     No    Okay         c2t12d0s0
        c2t8d0s0                2889     No    Okay        
        c2t9d0s0                2889     No    Okay        
        c2t10d0s0               2889     No    Okay        
        c2t11d0s0               2889     No    Okay        




hsp000: 2 hot spares
        c2t12d0s0               In use          71124291 blocks
        c2t13d0s0               Available       71124291 blocks


#

OK, c2t5d0s0 is the sixth disk from the left in the array - or to put that another way it is the left one of the two middle disks!

You can run format->analyse->read which will determine whether the disk is really dead.
# format
Searching for disks...done




AVAILABLE DISK SELECTIONS:
       0. c1t0d0
          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde452,0
       1. c1t1d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde367,0
       2. c2t0d0

          /pci@8,600000/pci@1/scsi@4/sd@0,0
       3. c2t1d0

          /pci@8,600000/pci@1/scsi@4/sd@1,0
       4. c2t2d0

          /pci@8,600000/pci@1/scsi@4/sd@2,0
       5. c2t3d0

          /pci@8,600000/pci@1/scsi@4/sd@3,0
       6. c2t4d0

          /pci@8,600000/pci@1/scsi@4/sd@4,0
       7. c2t5d0

          /pci@8,600000/pci@1/scsi@4/sd@5,0
       8. c2t8d0

          /pci@8,600000/pci@1/scsi@4/sd@8,0
       9. c2t9d0

          /pci@8,600000/pci@1/scsi@4/sd@9,0
      10. c2t10d0

          /pci@8,600000/pci@1/scsi@4/sd@a,0
      11. c2t11d0

          /pci@8,600000/pci@1/scsi@4/sd@b,0
      12. c2t12d0

          /pci@8,600000/pci@1/scsi@4/sd@c,0
      13. c2t13d0

          /pci@8,600000/pci@1/scsi@4/sd@d,0
Specify disk (enter its number): 7
selecting c2t5d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> anal


ANALYZE MENU:
        read     - read only test   (doesn't harm SunOS)
        refresh  - read then write  (doesn't harm data)
        test     - pattern testing  (doesn't harm data)
        write    - write then read      (corrupts data)
        compare  - write, read, compare (corrupts data)
        purge    - write, read, write   (corrupts data)
        verify   - write entire disk, then verify (corrupts data)
        print    - display data buffer
        setup    - set analysis parameters
        config   - show analysis parameters
        !
   - execute , then return
        quit
analyze> read
Ready to analyze (won't harm SunOS). This takes a long time,
but is interruptable with CTRL-C. Continue? y

        pass 0
   24619/26/53 

        pass 1
   24619/26/53 

Total of 0 defective blocks repaired.
analyze> q


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> q
#

The excerpt above didn't show any repairs, but they may be some when you run the command.



Raise a call for the new disk under your support contract. Or failing that search the server room for a compatible disk - Good Luck!

As long as a hot spare has jumped in, the dead disk can just be removed from the array and the new one inserted. The disk will spin up immediately.

Wait for the green light to come on. And Bob's your uncle!

Run format to apply a disk label. You won't be able to format the disk until you do.
# format
Searching for disks...done

c2t5d0: configured with capacity of 33.92GB


AVAILABLE DISK SELECTIONS:
       0. c1t0d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde452,0
       1. c1t1d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde367,0
       2. c2t0d0

          /pci@8,600000/pci@1/scsi@4/sd@0,0
       3. c2t1d0

          /pci@8,600000/pci@1/scsi@4/sd@1,0
       4. c2t2d0

          /pci@8,600000/pci@1/scsi@4/sd@2,0
       5. c2t3d0

          /pci@8,600000/pci@1/scsi@4/sd@3,0
       6. c2t4d0

          /pci@8,600000/pci@1/scsi@4/sd@4,0
       7. c2t5d0

          /pci@8,600000/pci@1/scsi@4/sd@5,0
       8. c2t8d0

          /pci@8,600000/pci@1/scsi@4/sd@8,0
       9. c2t9d0

          /pci@8,600000/pci@1/scsi@4/sd@9,0
      10. c2t10d0

          /pci@8,600000/pci@1/scsi@4/sd@a,0
      11. c2t11d0

          /pci@8,600000/pci@1/scsi@4/sd@b,0
      12. c2t12d0

          /pci@8,600000/pci@1/scsi@4/sd@c,0
      13. c2t13d0

          /pci@8,600000/pci@1/scsi@4/sd@d,0
Specify disk (enter its number): 7
selecting c2t5d0
[disk formatted]
Disk not labeled.  Label it now? y


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> q
#


Enter a prtvtoc/fmthard command combination to ensure the disk has the same slices as the replaced disk. In the command below I use the disk that is in the equivalent position on the other side of the mirror as the source of the configuration.
# prtvtoc /dev/rdsk/c2t0d0s0 | fmthard -s - /dev/rdsk/c2t5d0s0
fmthard:  New volume table of contents now in place.
#



Check the disk format:
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c1t0d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde452,0
       1. c1t1d0

          /pci@9,600000/SUNW,qlc@2/fp@0,0/ssd@w21000004cffde367,0
       2. c2t0d0

          /pci@8,600000/pci@1/scsi@4/sd@0,0
       3. c2t1d0

          /pci@8,600000/pci@1/scsi@4/sd@1,0
       4. c2t2d0

          /pci@8,600000/pci@1/scsi@4/sd@2,0
       5. c2t3d0

          /pci@8,600000/pci@1/scsi@4/sd@3,0
       6. c2t4d0

          /pci@8,600000/pci@1/scsi@4/sd@4,0
       7. c2t5d0

          /pci@8,600000/pci@1/scsi@4/sd@5,0
       8. c2t8d0

          /pci@8,600000/pci@1/scsi@4/sd@8,0
       9. c2t9d0

          /pci@8,600000/pci@1/scsi@4/sd@9,0
      10. c2t10d0

          /pci@8,600000/pci@1/scsi@4/sd@a,0
      11. c2t11d0

          /pci@8,600000/pci@1/scsi@4/sd@b,0
      12. c2t12d0

          /pci@8,600000/pci@1/scsi@4/sd@c,0
      13. c2t13d0

          /pci@8,600000/pci@1/scsi@4/sd@d,0
Specify disk (enter its number): 7
selecting c2t5d0
[disk formatted]


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> p


PARTITION MENU:
        0      - change `0' partition
        1      - change `1' partition
        2      - change `2' partition
        3      - change `3' partition
        4      - change `4' partition
        5      - change `5' partition
        6      - change `6' partition
        7      - change `7' partition
        select - select a predefined table
        modify - modify a predefined partition table
        name   - name the current table
        print  - display the current table
        label  - write partition map and label to the disk
        !
- execute , then return
        quit
partition> p
Current partition table (original):
Total disk cylinders available: 24620 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm       0 - 24618       33.91GB    (24619/0/0) 71124291
  1 unassigned    wu       0                0         (0/0/0)            0
  2     backup    wu       0 - 24619       33.92GB    (24620/0/0) 71127180
  3 unassigned    wu       0                0         (0/0/0)            0
  4 unassigned    wu       0                0         (0/0/0)            0
  5 unassigned    wu       0                0         (0/0/0)            0
  6 unassigned    wu       0                0         (0/0/0)            0
  7 unassigned    wm   24619 - 24619        1.41MB    (1/0/0)         2889

partition> q


FORMAT MENU:
        disk       - select a disk
        type       - select (define) a disk type
        partition  - select (define) a partition table
        current    - describe the current disk
        format     - format and analyze the disk
        repair     - repair a defective sector
        label      - write label to the disk
        analyze    - surface analysis
        defect     - defect list management
        backup     - search for backup labels
        verify     - read and display labels
        save       - save new disk/partition definitions
        inquiry    - show vendor, product and revision
        volname    - set 8-character volume name
        !
     - execute , then return
        quit
format> q

#


Replace the hotspare disk with the original - the replacement command works on the mirror not the submirror which actually has the failed disk!:
# metareplace -e d3 c2t5d0s0
#


Use metastat | grep to check when the mirror has finished resync-ing
#  metareplace -e d3 c2t5d0s0
d3: device c2t5d0s0 is enabled
# metastat
d0: Mirror
    Submirror 0: d10
      State: Okay        
    Submirror 1: d20
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 18876726 blocks

d10: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s0                   0     No    Okay        


d20: Submirror of d0
    State: Okay        
    Size: 18876726 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s0                   0     No    Okay        


d1: Mirror
    Submirror 0: d11
      State: Okay        
    Submirror 1: d21
      State: Okay        
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 33555735 blocks

d11: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t0d0s1                   0     No    Okay        


d21: Submirror of d1
    State: Okay        
    Size: 33555735 blocks
    Stripe 0:
        Device              Start Block  Dbase State        Hot Spare
        c1t1d0s1                   0     No    Okay        


d3: Mirror
    Submirror 0: d30
      State: Okay        
    Submirror 1: d31
      State: Resyncing   
    Resync in progress: 0 % done
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 355604121 blocks

d30: Submirror of d3
    State: Okay        
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t0d0s0                   0     No    Okay        
        c2t1d0s0                2889     No    Okay        
        c2t2d0s0                2889     No    Okay        
        c2t3d0s0                2889     No    Okay        
        c2t4d0s0                2889     No    Okay        


d31: Submirror of d3
    State: Resyncing   
    Hot spare pool: hsp000
    Size: 355604121 blocks
    Stripe 0: (interlace: 64 blocks)
        Device              Start Block  Dbase State        Hot Spare
        c2t5d0s0                   0     No    Resyncing   
        c2t8d0s0                2889     No    Okay        
        c2t9d0s0                2889     No    Okay        
        c2t10d0s0               2889     No    Okay        
        c2t11d0s0               2889     No    Okay        


hsp000: 2 hot spares
        c2t12d0s0               Available       71124291 blocks
        c2t13d0s0               Available       71124291 blocks

# metastat d3 | grep "Resync in progress"
    Resync in progress: 5 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 5 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 6 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 6 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 8 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 51 % done
# metastat d3 | grep "Resync in progress"
    Resync in progress: 60 % done
#



Just repeat that last metastat command until you get no output and the resync-ing will have completed.


And that's that!

Sunday, March 14, 2010

Further Good Links/Tools

In a previous post, I described some tools that I had found useful for documenting the system environment.

A pretty comprehensive system description of many versions of the Windows OS can be generated by SIW - Systems Information for Windows, which was written by Gabriel Topala. It is only free for personal use. The pricing model for Businesses is in the right sort of ball park.

If you are a Legato Networker user and you need to generate reports for either your own benefit or the benefit of others, then the Networker Reporting Utility is well worth a look.The download mechanism is rather painful, but it is worthwhile. There is a product from Legato which does a similar job, but for our environment we were looking at having to spend upwards of £30,000 to £40,000! That decision was pretty much a no-brainer!

And that's that for now!

Friday, March 12, 2010

What a PITA!!

This week I've been configuring a couple of new HP servers, ProLiant DL 360 G6s to be precise. They were configured to have 8 internal disks so the internal CD/DVD-ROM had to be sacrificed.

So to configure them, I temporarily attached a USB CD_ROM drive, Keyboard and Mouse and attached a monitor.

Having installed the latest CentOS x86_64 Linux from CD on the first one, I removed the CD-ROM and rebooted. It hung coming up configuring the USB storage driver!

So I spent a day googling for help. I re-installed Linux half a dozen or so times. I downloaded the DVD and followed the instructions to make that available via both HTTP and NFS and tried using the first CD to boot from and then installing across the network. Actually that was considerably faster than the CD method!


Nothing changed the hang during boot. Always on the USB storage driver installation.

As a punt with nothing else to lose, I unplugged the USB keyboard and mouse and rebooted.

The b*$#*^d box booted all the way.

I plugged the keyboard in and it was recognized without fuss and was immediately useful.

So it goes! Some days are just like that.