LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Fatal Attraction: a lesson in arptables

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Fatal Attraction: a lesson in arptables
From: Bandit Lazuli <banditlazuli@xxxxxxxxx>
Date: Thu, 13 Apr 2006 10:18:17 -0700 (PDT)
Our cluster of web frontends periodically exhibited a kind of Fatal
Attraction behavior, where one host would suddenly be the recipient of
all hits. Attempting to add new hosts to the existing cluster
triggered this behavior in a consistent way. With something clear to
fix, we installed the latest version of keepalived on the latest RHEL4
kernel.
 
And lo, nothing changed. Add a new host, it became a Fatal Attractor
within 6 minutes of operation (note that this is NOT the Thundering
Herd problem; things were relatively well balanced for a minute or 6).
 
Stranger yet, ipvsadm on the director revealed that the Attractor was
getting NO hits. So it wasn't that the LVS was sending all hits to one
machine. You guessed it. The new machine was arping for the shared ip,
and connections were coming directly to it.
 
We had arptables set up as follows:
*filter
:IN ACCEPT [0:0]
:OUT ACCEPT [0:0]
-A IN -d 192.168.0.12 -j DROP
COMMIT
 
And in desperation, started arptables at runlevel 1. This didn't help,
because it wasn't responding to an inbound arp request, but was
instead generating it's OWN arp request, and broadcasting the response
it made to itself.
 
This could be seen with:
 
tcpdump -i any arp > file
 
And then pawing through the file for the shared ip (name). So there
lies the smoking gun. Arptables was NOT working as advertised. So we
added:
-A OUT -d 192.168.0.12 -j mangle --mangle-ip-s 192.168.0.104
 
This still did not do the trick; apparently arptables implicitly
operates on the interface owing the ip (lo:1, in our case), if no
interface is specified. That left eth0 leaking arps.
 
Specifying the interface did the trick:

-A OUT -s 192.168.0.12 -o eth0 -j mangle --mangle-ip-s 192.168.0.104
And here is the whole filter:
 
*filter
:IN ACCEPT [0:0]
:OUT ACCEPT [0:0]
-A IN -d 192.168.0.12 -j DROP
-A OUT -s 192.168.0.12 -o eth0 -j mangle --mangle-ip-s 192.168.0.104
COMMIT

arps are now properly squelched, and fatal attractor behavior has vanished. I'm 
posting this because I longed for google to return such a message in response 
to many searches.




                
---------------------------------
Love cheap thrills? Enjoy PC-to-Phone  calls to 30+ countries for just 2¢/min 
with Yahoo! Messenger with Voice.

<Prev in Thread] Current Thread [Next in Thread>