My Mileage Did Vary!

r

I made a minor configuration change to a Solaris zone and rebooted the zone.

On the way back up it hung with an error message about booting.

I entered zoneadm list -cv with the following results

root@server01 # zoneadm list -cv

ID NAME STATUS PATH BRAND IP
0 global running / native shared
1 licence3-zone running /export/home/zones/licence3-zone native shared
2 licence2-zone running /export/home/zones/licence2-zone native shared
5 licence-zone running /export/home/zones/licence-zone native shared
7 licence1-zone running /export/home/zones/licence1-zone native shared
21 la-zone running /export/home/zones/la-zone native shared
23 build-machine down /export/home/zones/build-machine native shared
24 build-2 running /export/home/zones/build-2 native shared
- build-1 installed /export/home/zones/build-1 native shared
root@server01#
Its state was shown as down!

I entered
zoneadm -z build-machine halt
but it failed with a message saying the zone's /tmp directory couldn't be unmounted.

Entering
zoneadm -z build-machine reboot
failed with a similar message.

Entering
zoneadm -z build-machine boot
was the same.

So I traversed the zone's filesystem from the global zone. I'd downloaded some files to the zone's /tmp directory. I deleted the entire contents and entered
zoneadm list -cv
Its state was still shown as down!
I entered zoneadm -z build-macxhine halt, which returned without error. A zoneadm list -cv showed some good news.

root@server01 # zoneadm -z build-machine halt
root@server01 # zoneadm list -cv
ID NAME STATUS PATH BRAND IP
0 global running / native shared
1 licence3-zone running /export/home/zones/licence3-zone native shared
2 licence2-zone running /export/home/zones/licence2-zone native shared
5 licence-zone running /export/home/zones/licence-zone native shared
7 licence1-zone running /export/home/zones/licence1-zone native shared
21 la-zone running /export/home/zones/la-zone native shared
24 build-2 running /export/home/zones/build-2 native shared
- build-machine installed /export/home/zones/build-machine native shared
- build-1 installed /export/home/zones/build-1 native shared
root@server01 #

And it booted successfully. Huzzah!

Well, that's that!

r

Funny as in peculiar!

This morning an user came up to me. He had tried to create a new view and received an error message.

I logged in as him and got the same error.
# cleartool mkview -tag mtest /net/server/views/my_view.vws cleartool: Error: Unable to contact albd_server on host 'server' cleartool: Error: Cannot bind an admin_server handle on "server": ClearCase object not found. cleartool: Error: Unable to create view "/net/server/views/my_view.vws".
...

I googled the error message and it showed up a whole bunch of pages with similar discussions to this IBM support document.

However that wasn't the problem.

For various reasons, the user had set the environment variable ATRIAHOME to what amounted to garbage for his shell. That value had then been used by the cleartool mkview command, with the appropriate error.

So it goes!

r

I've been running this blog with some of Blogger's tools for a while now.

Obviously there is Adwords, but those trawling through the HTML (but why would you these days?) will also pick up that I'm using Google Analytics.

Now I turned on Analytics some months ago and the results have been interesting. People are actually reading my blog. Quite a lot. Generally from the US, UK, Canada, Australia and Germany. I had wondered, because there were very few clicks on the Adwords. Perhaps I should not be surprised by that - I only rarely follow an Adwords link on another Blog. I've almost learnt to phase them out automatically. I think it is only the Adwords on GMail I really ever click. I'm beginning to wonder if anyone really clicks on Adwords links anymore.

Now, Google analytics has really shown which posts have and continue to attract attention. It is the VMware posts which attract most attention. In fact for the last month the top 5 are all VMware posts. The highest Solaris post is the one about changing hostids and that is coming in at no.7.

So it goes!

r

This is the script I mentioned using in the my earlier post about replacing the hardware of a NIS+ master.

Having had to use it now, perhaps the only other thing I would add into it would be the root crontab, but that would simply be:
crontab -l >> $log

Obviously, < server > is your server name, i.e. atuin or sunsvr01. And the < off server storage location > is a storage location that is easily accessible when you need it!

# more scripts/nisbackup.sh
#!/bin/sh

log="/var/tmp/nisbackup.log"
date=`date '+%m/%d/%y at %H:%M:%S'`

echo "Starting NIS+ backup on $date" > $log
cp -p /etc/.rootkey /var/nisplus.rootkey.copy
cp -p /var/nis/NIS_COLD_START /var/<server>-NIS_COLD_START
cp -p /etc/shadow /var/<server>-shadow
cp -p /etc/passwd /var/<server>-passwd
/usr/sbin/nisbackup -a /var/nisplus_backup

if [ $? -eq 0 ]; then
cd /var
echo "Listing nisplus_backup" >> $log
ls -l /var/nisplus_backup >> $log
echo "Creating tar" >> $log
tar cvf /<off server storage location>/nisplus/backup/nisplusbackup.tar nisplus_backup nisplus.rootkey.copy <server>-NIS_COLD_START <server>-shadow <server>-passwd >> $log
echo "Checking validity of tar:" >> $log
tar tvf /apps/admin/nisplus/backup/nisplusbackup.tar >> $log
else
echo "nisbackup failed!!!" >> $log
fi

/usr/bin/mailx -s "NIS+ backup" server_admins < $log

#

r

Whilst searching for information on a specific feature of IP-Filter, I came across a new resource of Solaris Information.

My only worry, other than with Oracle taking SUN over Solaris will wither on the vine, is that a lot of this information may be quite old, if I was solely to judge the site from the Solaris logo!

I'm going to add the site to my useful links. And that's that!

r

The file system on the NIS+ master had become corrupt.

Not the disks. The root fs and swap were both mirrored with disksuite. And metastat reported everything to be in order!

But files and directories all over the shop were "missing" - actually I/O error was reported on the command line -and a quick look in the messages file revealed that the ugly truth.

There are a couple of good sites out on the internet which describe how to recover from this sort of problem:
SUN
Solaris FAQ

Luckily, I have a script which runs several times a day which copies off the NIS_COLD_START, passwd, shadow and .rootkey files from /etc and executes a nisbackup -a command and tar the whole lot up into a file on a file server. The frequency with which this script is run depends upon the TTL of the domain. My domain still has the default of 12 hours, so in theory the script only needs to run twice a day. But 3 or 4 times would be better. I'm not sure it is worthwhile keeping all of these, but the last couple, maybe.

As luck would have it, The server crashed running this script. Just after writing the tarfile to the fileserver. The script also tests the validity of the backup by extracting all the files to a temporary directory. I guess the corrupt filesystem just decided it couldn't handle that.

I grabbed hold of an old Sun Blade 150 that was in the store cupboard, for just such an eventuality and changed its identity to be the same as the failed NIS+ master. I changed
/etc/hosts
/etc/hostname.eri0
/etc/nodename
/etc/net/ticlts/hosts
/etc/net/ticots/hosts
/etc/net/ticotsord/hosts
and entered hostname

I ftp-ed my tarfile backup into /tmp and untar-ed it.

I copied the passwd, shadow and .rootkey files into /etc overwriting any existing files.

And then I entered nisrestore -f -a /tmp/nisplus_backup ignoring any output.

And then I shutdown the server, moved it into the racks and restarted the server.

As a test on the new server when it was back up, I ran nisping -C -a and also ran the script which backs up all the data.

The backup command failed!

Aarrgghh!

Luckily the problem was clear from the messages file. The directory you tell nisbackup to write to must exist before you enter the command.

Phew!

And that's that!

r

My company uses Symantec Endpoint Protection on all the windows servers. I've known for some time that there was a Linux client, but over the last week Nessus security scans were run against both some really old legacy Solaris servers, the Linux servers and also against the windows servers.

Now the Windows servers were protected by EndPoint and received a clean bill of health.

The Linux servers all have iptables firewalls and SELinux in enforcing mode, and so generated a few false positives, but were generally clean. The worst was that a few web servers hadn't had the TraceEnable Off parameter added to their configuration.

The Solaris servers fared worse. Simply due to their age and the fact that their purpose had been in a development environment.

The thing about EndPoint which I hadn't previously realised was that it detected attempted intrusions and refused further connections from those hosts originating the attacks. In this way it seemed to be operating much much like one of the modes that it was possible to configure into PortSentry. (It is really surprpising to think that the last release of PortSentry is almost seven years old now!) Consequently, I began lobbying for additional budget to purchase licences for the additional platforms.

The ability to have a single "management station" control the security protection across heterogenous server environment is incredible.

That's that for now!

My Mileage Did Vary!

Thursday, July 22, 2010

Solaris Zone down

Tuesday, July 20, 2010

ClearCase Funny!

Tuesday, April 13, 2010

Blogging Tools

Tuesday, April 6, 2010

NIS+ master Backup script

Monday, April 5, 2010

New Solaris Resource

Monday, March 22, 2010

Replacing the hardware of the NIS+ master

Sunday, March 21, 2010

Symantec EndPoint Protection

Blog Archive

Useful Links

My Blog List

Search

Search Results

Labels