I’ve been running an ntpd server as part of the UK pool since 2007 but since upgrading from OpenSolaris 2009.06 to snv_129, I’ve had a very poor score. So poor, that for more than a few weeks I’ve been dropped from the uk.pool.ntp.org CNAME 🙁
The problem (after I fixed the missing ptys) manifest itself as a series of entries in
/var/adm/messages with varying IP addresses but all of the form:
sendto(18.104.22.168) (fd=53): Not owner
and a random delay (or packet drop) to the time responses that meant I was deemed to be unreliable.
I spent a long time with
truss and Google and didn’t come up with anything useful, but narrowed the behaviour down to something peculiar with my routing – I have three NICs in my OpenSolaris box: one of them with a public IP and one with a NATed one, although both end up at the same router (no – best you don’t ask why). To prevent NTP requests arriving on my public IP and then departing by the default route (via the NAT) I have used an odd looking IPFilter rule for Transparent Routing, which enables packets matching a rule to be sent to a specific NIC – in this case, all packets with a From address matching my public IP were being forced back out of the public NIC regardless of the routing table entries. This had worked for months on 2009.06, and after a lot of poking, appeared to be doing the right thing on snv_129 as well.
Most of the Google comments suggested that it’s perfectly acceptable to ignore
sendto errors in most cases, but I couldn’t figure out where they were being sent from until I started poking around with
ndd (in a failed attempt to find source based routing for UDP packets), and tucked away in the
/dev/udp collection was exactly the setting I needed, so after issuing:
pfexec ndd -set /dev/udp udp_sendto_ignerr 1
The time started to flow again, and so far over 12 monitoring periods the step has generally been under 0.005 – with a nice stable ADSL line overnight I should be back in the UK pool by morning 🙂 I’m not sure what changed in the network as I haven’t gone back to my 2009.06 BE to take a look at the original
ndd settings, but I was never as happy with my ntp score on OpenSolaris as I had been with my Qube 2 so this could have been the reason all along.
After getting so excited about figuring out what was up with the upgrade to 2009.06 I ran into another, more sticky problem. I rushed into reattaching the zones I’d had to detach to get
beadm working by using:
zoneadm -z zonename attach -F
Oh dear: that was a bad idea. The zone appeared to attach but
zoneadm -z zonename boot failed and then I discovered it was impossible to delete, rename or reconfigure the zone.
After a few attempts to recover things, the correct answer turns out to be to manually edit
/etc/zones/index to change the state of the zone to read
configured, and then it’s trivial to reattach the zone with:
zoneadm -z zonename attach -u -d path/to/zonename/ROOT/zbe
at which point it automatically upgrades the zone to 2009.06.
New OpenSolaris release: Yay !
Updater fails on
beadm create, and manual attempts also fail: Boo !
After a lot of
grumbling Googling down plenty of dead-ends it appears that
beadm in 2008.11 gets very upset when there are Zones attached. A set of
zoneadm detach commands later and the updater completed without any problems at all.