Saturday, May 31, 2008

Whitespace in Solaris 10 LDAP configuration

Spaces are allowed in the ldapclient command line if attribute is surrounded by "", i.e.
-a "proxyDN=cn=admin,cn=emea users,dc=example,dc=com"
or
-a "defaultServerList=123.123.123.1 123.123.123.2"

Failing to quote either attribute statement will cause the ldapclient command line to fail with a parsing error.

However, there are instances where quoting the attribute definition will cause the command line parsing to work and for the command to succeed but ldap lookups can still fail.
May 22 18:09:44 server1 nscd[4012]: [id 293258 user.error] libsldap: Status: 49 Mesg: openConnection: simple bind failed - Invalid credentials

The problem is resolved when the proxy user is replaced with another from an OU which doesn't contain a space, i.e.
-a "proxyDN=cn=admin,cn=users,dc=example,dc=com"

Friday, May 30, 2008

Netmask settings for Solaris 10 Zones

When the zone is created a large number of files are copied into the new zone. However, the netmask file seems to be generated as an empty file.

Consequently, when you enter ifconfig you end up seeing something like:
...
eri0:2: flags=1000843 mtu 1500 index 2
zone zone2
inet 123.123.123.11 netmask ffff0000 broadcast 123.123.123.255
...

Now obviously you can use ifconfig in the global zone to change the netmask setting interactively. That works. However, a reboot wipes that out. So you have to login to the console of the zone and amend the copied netmask file so that it contains a line like:
123.123.123.0 255.255.255.0

Alternatively, you can access the netmasks file via the global zone filesystem.

Legato Networker Configuration Duplication

I made a mistake whilst installing Legato Networker on a Solaris 10 box. The mistake prevented the backup server from backing up the Solaris 10 client.

I support multiple DNS domains and the backup server appears in many of those domains and as the backupserver is multihomed it appears in those domains with different IP addresses.

The Solaris 10 client was in Domain dom1.example.com, one of the few sub-domains the backup server actually isn't in!

When I installed NetWorker I specificed that the backupserver as backupserver.dom2.example.com. The backupserver was actually trying to communicate with the Solaris 10 client over the network where it was known as backupserver.dom3.example.com.

Networker didn't like the loop effect of this arrangement and the backup was failing.

In fact I should have just specified a simple server name, unqualified by a domain name and added the backupserver into hosts file.

D'Oh!

However, it took a little longer to resolve than I had expected because Networker not only records the backup server in the /nsr/res/servers file but also records it several times in the /nsr/res/nsrla.res file.

As I said above, I would have saved myself some confusion by entering the unqualified name of the backup server upon installation.

*nix & Windows integration software I'd like to use

Vintela have some pretty cool software for integrating Solaris & Linux systems into an AD environment.

Perhaps because it has to compete against Samba and other freeware solutions, their products aren't ridiculously expensive either. Quite a refreshing experience. Its a shame that the normal mode of operation for most software houses is to seek to soak their customers. No names! No packdrill! But we all know who I mean.

Perhaps the coolest feature is the ability to apply GPOs to *nix clients. Add in the inexpensive nature of the software, and it seems pretty compelling.

What's the downside?

For my company, its the fact that our AD Servers aren't running Windows 2003 R2, which is a requirement of the solution.

Thursday, May 29, 2008

Teamcity and ClearCase

Teamcity has a known problem with requiring sequential version numbering for the files and directories it tracks.

The problem was exposed here when a developer removed the latest version of a directory and created a new version. The developer removed version 4. The recreated version became version 5! Teamcity knew the latest version of the directory ought to be version 4 and was very upset that it was missing. The continuous integration suddenly wasn't continuously integrating.

Now, some might argue the problem was with ClearCase for not assigning the new version the same version number as the removed version. However, I do not. ClearCase is using the same mechanism when the removed version is the latest version as when the version being removed is in the middle of the version tree.

Anyway, the resolution was pretty straightforward, if not pretty.

checkout the parent directory
checkout the "broken directory", we'll call it dirent1
move all files out of dirent1, i.e. cleartool mv * ..
checkin dirent1
cleartool rmname dirent1
create new directory with same name as dirent1
move files needed for version 1 into dirent1
checkin dirent1
checkout dirent1
move files needed for version 2 into dirent1
checkin dirent1
checkout dirent1
move files needed for version 3 into dirent1
checkin dirent1
checkout dirent1
move all remaining files in dirent1
checkin dirent1
checkin parent directory

We ended up with a version 3 on your parent directory and a dirent1 directory with 4 versions.

Some build labels also had to be manually re-assigned to appropriate versions, but luckily not that many.

VMware ESX re-signaturing of the SAN Config

After a slightly strange power outage in the server room at work - the UPS stayed up everything else in the server room went down!! - I came across the situation that an ESX server had lost the primary connection to its SAN through the multipath fibre channel switch fabric.

Cue: extreme nervousness. To put it mildly.

There were a number of messages on the ESX Server console of the form:
cpu2:1034)LVM: ProbeDeviceInt:4903: vmhab1:1:0:1 may be snapshot: disabling access. See resignaturing section in SAN config guide

Actually, the last part of the error message is very good advice. A good read of the SAN configuration guide is well worth the time and effort.

Somewhere along the line the ESX host has lost this VMFS3 volume and picked it up on a different path, vmhba1:1:0:1. When the host came back up, it picked up the VMFS3 on the different path, but importantly kept information about this partition at it's previous path. This is why it's decided it's looking at a Snapshot, and responded in this manner.

So go into the console, click on the Configuration tab and select "Advanced Settings"

Expand the LVM section and set LVM.EnableResignature to 1
Then click OK to apply settings.

Select the "storage adapters" link under the configuration tab and click the "rescan" button (upper right).
Right click the vmhba (under the controller adapter for your machine) and click "rescan".

Then when you go to the summary tab, you right click and slect "refresh" and you should see your storage volume.

At least, that is what the manual would have you believe. My experience was rather different.

ESX was perfectly happy to see the volume on its new path as a new volume. Consequently, I had to remove all my inaccessible VMs and re-register them from the "new" volume. I may have had other options. This seemed to be the quickest at the time.

After all that, all the VMs started up without error, and other than a delay restarting the VMs, the users were unaware of the problem.

Solaris 10 authenticating against Active Directory

There a number of good blogs discussing this subject. I'd recommend Scott's and the OpenSolaris blogs.

I used Scott Lowe's blog for the instructions on how to get CentOS Linux machines to authenticate against Active Directory. It was right on the money. Especially as I needed my machines to run Samba to create an interoperability solution for a number of software development teams who use IBM Rational ClearCase.

However, his instructions for Solaris 10 servers never worked in my environment.

My environment is Windows 2003 Active Directory with all the servers have been patched with service pack 2 and the latest monthly patches. The Server for NIS and Password Synchronization modules of Services For Unix v3.5 has also been installed, which obviously had extended the schema. As Service Pack 2 had been installed, the hotfix that fixes passwd sync after the "upgrade" has also been applied. N.B. the servers are not running Windows 2003 R2 - that would make a big difference and from all accounts it would be much easier to interoperate with.

There is an article on BigAdmin on this subject. The method described almost worked for me. I'd say it went 90% of the way. The part that didn't work was the ldapclient command. Specifically, it was trying to usecredentialLevel=self with authenticationMethod= sasl/gssapi , i.e.

ldapclient -v manual \
-a credentialLevel=self \
-a authenticationMethod=sasl/gssapi \
...


I was able to get around this by changing the ldapclient command to:

ldapclient -v manual \
-a credentialLevel=proxy \
-a authenticationMethod=simple \
-a proxyDN=cn=proxy_user,cn=users,dc=example,dc=com \
-a proxyPassword=password \
...


I also had to change the serviceSearchDescriptor attributes from

-a serviceSearchDescriptor=passwd:cn=users,dc=example,dc=com?one \
-a serviceSearchDescriptor=group:cn=users,dc=example,dc=com?one

to

-a serviceSearchDescriptor=passwd:dc=example,dc=com?sub \
-a serviceSearchDescriptor=group:dc=example,dc=com?sub


That done and Bob was my parental Sibling of the usually male variety!

gcc 64bit compilation on Solaris 10

By default, if you were to build a shared library with gcc, you'd enter commands similar to
# gcc -fPIC -c file.c
# gcc -shared -o file.so file.o
The resultant shared library would be 32 bit, i.e.
# file file.so
file.so: ELF 32-bit MSB dynamic lib SPARC Version 1, dynamically linked, not stripped, no debugging information available
#

If you wish to build a 64 bit library you should amend the command above as follows:
# gcc -fPIC -m64 -mcpu=ultrasparc -c file.c
# gcc -shared -m64 -mcpu=ultrasparc -o file.so file.o

With the result:
# file file.so
file.so: ELF 64-bit MSB dynamic lib SPARCV9 Version 1, UltraSPARC1 Extensions Required, dynamically linked, not stripped, no debugging information available
#

Tuesday, May 27, 2008

Changing hostids for Solaris 10 Zones

I found 3 resources on the Internet which discuss changing the hostid of a Solaris instance.

Only two are specifically related to the case of Solaris Zones. The other is a more general "You can change the hostid..." type of resource.

In my experience with Solaris 10 Zones, only one of these methods succeeded.

Initially I attempted the "Dynamic Library Interposition method described by Julien Gabel on his Blog'o thnet. Initially this held promise.

  1. I compiled the code
  2. I set the environment variable
  3. I ran the code
Success!!

I added the environment variable to an existing startup script. It failed!

The error indicated the library was the wrong type! As I had installed the 64-bit version of Solaris 10, I re-built the dynamic library as 64-bit. Received the same error!


At this point, I actually created my zones, following the outline provided by this Zones Tutorial. The tutorial on how to create a Solaris 8 Zone on a Solaris 10 Server describes how to set an attribute of the Zone as the hostid. Perhaps this works when you really do have a Solaris 8 Zone on your Solaris 10 Server, but it didn't work for my Solaris 10 zones on a Solaris 10 Server.


Finally I resorted to the other mechanism for altering hostids described upon Julien Gabel's Blog'o thnet - daemonizing a DTrace script. This did work. In fact it works very well. Much Kudos should be directed towards Brendan Gregg.