On Wed, 30 Nov 2005, Graeme Fowler wrote:
Hi there
On Wed 30 Nov 2005 15:04:19 GMT , John Line <jml4@xxxxxxxxxxxxxx> wrote:
The problem is that while access to the web cache (131.111.8.1) through the
new director "just worked", the WPAD requests almost all got stuck - shown
as SYN_RECV in /proc/net/ip_vs_conn and reported in /var/log/messages as
... ip_rt_bug: [clientIPhere] -> 131.111.8.68, eth1
and corresponding packets do not get forwarded to a real-server.
...the only time I've ever triggered anything like this was whilst running a
very early 2.6.x kernel on a gateway box at home which did both SNAT and DNAT
depending on direction of traffic. I managed to get myself into some weird
condition at one point whereby the packets entering the external interface
which were destined for the local machine fell through the netfilter tables
and ended up trying to go out via a DNAT rule. At this point the kernel spat
the dummy, logged an ip_rt_bug message and stopped forwarding those packets.
We're not using NAT, but the current SLES9 kernel is 2.6.5-7.201, which
could be enough to include old bugs for which SuSE's not back-ported
fixes.
A kernel update later and it all went away. I never did find out what caused
it, either.
The problem's persisted over several kernel updates, but those will all
have been 2.6.5 + an ever-increasing number of fixes applied by SuSE.
This thread might be of interest (although it's likely to be academic, if at
all):
http://www.ussg.iu.edu/hypermail/linux/kernel/0504.3/index.html#0239
Google found that one for me, but it didn't offer any real clues - my
conclusion was (rightly or wrongly) that if my problem had the same
underlying cause then it should apply equally to both virtual servers and
ought to have been seen by other people using LVS with 2.6 kernels.
Do you see *any* traffic hitting the realservers at all when these messages
are being logged? Also, is anything being sent back to the clients?
I've re-checked that, and as I believed from earlier tests, nothing is
passed on to the real-servers or sent back to the clients, in the cases
where ip_rt_bug is logged. It was still, bizarrely, allowing traffic from
my home PC over the VPDN (which was passed to a real-server as normal),
but reporting ip_rt_bug for literally everything else. Of course, I can't
say whether connections from anywhere else might also get through.
Understandably debugging this is going to be fairly hard, but if it were me
I'd park a whole heap'o'debug -j LOG statements into whatever netfilter
ruleset you have on the director, and run tcpdump writing files on the
interfaces involved (client, director in, director out, realserver in,
realserver out) and see what shows up.
The comments above are based on tcpdump on LVS director and real-server
(using a trimmed configuration with only one real-server for simplicity).
I didn't add netfilter logging (wasn't obvious *what* to try logging), but
the iptables logging set up by SuSE's fancy firewall script didn't log
anything in relation to the virtual IP address (so it wasn't due to
firewalling, unless due to an action for which logging was suppressed).
Other than that, flummoxed.
Well, thank you for the suggestions - and I'm glad it's not just me that
finds the symptoms mystifying!
John
--
John Line - web & news development, University of Cambridge Computing Service
|