opensolaris, ntpd, transparent routing and sendto problems
I’ve been running an ntpd server as part of the UK pool since 2007 but since upgrading from OpenSolaris 2009.06 to snv_129, I’ve had a very poor score. So poor, that for more than a few weeks I’ve been dropped from the uk.pool.ntp.org CNAME 🙁
The problem (after I fixed the missing ptys) manifest itself as a series of entries in
/var/adm/messages with varying IP addresses but all of the form:
sendto(184.108.40.206) (fd=53): Not owner
and a random delay (or packet drop) to the time responses that meant I was deemed to be unreliable.
I spent a long time with
truss and Google and didn’t come up with anything useful, but narrowed the behaviour down to something peculiar with my routing – I have three NICs in my OpenSolaris box: one of them with a public IP and one with a NATed one, although both end up at the same router (no – best you don’t ask why). To prevent NTP requests arriving on my public IP and then departing by the default route (via the NAT) I have used an odd looking IPFilter rule for Transparent Routing, which enables packets matching a rule to be sent to a specific NIC – in this case, all packets with a From address matching my public IP were being forced back out of the public NIC regardless of the routing table entries. This had worked for months on 2009.06, and after a lot of poking, appeared to be doing the right thing on snv_129 as well.
Most of the Google comments suggested that it’s perfectly acceptable to ignore
sendto errors in most cases, but I couldn’t figure out where they were being sent from until I started poking around with
ndd (in a failed attempt to find source based routing for UDP packets), and tucked away in the
/dev/udp collection was exactly the setting I needed, so after issuing:
pfexec ndd -set /dev/udp udp_sendto_ignerr 1
The time started to flow again, and so far over 12 monitoring periods the step has generally been under 0.005 – with a nice stable ADSL line overnight I should be back in the UK pool by morning 🙂 I’m not sure what changed in the network as I haven’t gone back to my 2009.06 BE to take a look at the original
ndd settings, but I was never as happy with my ntp score on OpenSolaris as I had been with my Qube 2 so this could have been the reason all along.