LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Re: Bizarre LVS oddity - one VIP handled find, another givesip_rt_bug er

To: "LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: Bizarre LVS oddity - one VIP handled find, another givesip_rt_bug errors
From: John Line <jml4@xxxxxxxxxxxxxx>
Date: Thu, 1 Dec 2005 12:48:28 +0000 (GMT)
On Wed, 30 Nov 2005, Graeme Fowler wrote:

Hi there

On Wed 30 Nov 2005 15:04:19 GMT , John Line <jml4@xxxxxxxxxxxxxx> wrote:
The problem is that while access to the web cache (131.111.8.1) through the new director "just worked", the WPAD requests almost all got stuck - shown as SYN_RECV in /proc/net/ip_vs_conn and reported in /var/log/messages as

... ip_rt_bug: [clientIPhere] -> 131.111.8.68, eth1

and corresponding packets do not get forwarded to a real-server.

...the only time I've ever triggered anything like this was whilst running a very early 2.6.x kernel on a gateway box at home which did both SNAT and DNAT depending on direction of traffic. I managed to get myself into some weird condition at one point whereby the packets entering the external interface which were destined for the local machine fell through the netfilter tables and ended up trying to go out via a DNAT rule. At this point the kernel spat the dummy, logged an ip_rt_bug message and stopped forwarding those packets.

We're not using NAT, but the current SLES9 kernel is 2.6.5-7.201, which could be enough to include old bugs for which SuSE's not back-ported fixes.

A kernel update later and it all went away. I never did find out what caused it, either.

The problem's persisted over several kernel updates, but those will all have been 2.6.5 + an ever-increasing number of fixes applied by SuSE.

This thread might be of interest (although it's likely to be academic, if at all):

http://www.ussg.iu.edu/hypermail/linux/kernel/0504.3/index.html#0239

Google found that one for me, but it didn't offer any real clues - my conclusion was (rightly or wrongly) that if my problem had the same underlying cause then it should apply equally to both virtual servers and ought to have been seen by other people using LVS with 2.6 kernels.

Do you see *any* traffic hitting the realservers at all when these messages are being logged? Also, is anything being sent back to the clients?

I've re-checked that, and as I believed from earlier tests, nothing is passed on to the real-servers or sent back to the clients, in the cases where ip_rt_bug is logged. It was still, bizarrely, allowing traffic from my home PC over the VPDN (which was passed to a real-server as normal), but reporting ip_rt_bug for literally everything else. Of course, I can't say whether connections from anywhere else might also get through.

Understandably debugging this is going to be fairly hard, but if it were me I'd park a whole heap'o'debug -j LOG statements into whatever netfilter ruleset you have on the director, and run tcpdump writing files on the interfaces involved (client, director in, director out, realserver in, realserver out) and see what shows up.

The comments above are based on tcpdump on LVS director and real-server (using a trimmed configuration with only one real-server for simplicity). I didn't add netfilter logging (wasn't obvious *what* to try logging), but the iptables logging set up by SuSE's fancy firewall script didn't log anything in relation to the virtual IP address (so it wasn't due to firewalling, unless due to an action for which logging was suppressed).

Other than that, flummoxed.

Well, thank you for the suggestions - and I'm glad it's not just me that finds the symptoms mystifying!

                                John
--
John Line - web & news development, University of Cambridge Computing Service

<Prev in Thread] Current Thread [Next in Thread>