Solved... was Re: LVS-NAT: realserver as client (new thread, same subjec

To:	"LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject:	Solved... was Re: LVS-NAT: realserver as client (new thread, same subject!)
From:	Graeme Fowler <graeme@xxxxxxxxxxx>
Date:	Fri, 11 Mar 2005 20:36:54 +0000

Hi all

On Fri, 2005-02-11 at 19:49 +0000, Graeme Fowler wrote:
> 4. "Internal" VIPs.
> This one just came to me so please feel free to try it, I'm away from my
> development lab and it might prove to be a complete lemon anyway!
> Here's the idea: on the director, for every "external" VIP configuration which
> faces the clients (say VIP1) another VIP - iVIP1 - is also configured with
> identical realservers but attached to the _internal_ interface. The principle
> difference is that this VIP uses LVS-DR, because - for obvious reasons - the 
> realservers can respond directly to each other.
> The only complicated bit is setting up a netfilter rule to do DNAT as the 
> packets arrive - trap all packets destined for VIP1 and DNAT them to iVIP1. 
> Ensure iVIP1 is a loopback alias on your realservers as per normal DR 
> configuration, and in theory at least the realservers should then be able to 
> talk to each other as clients of a VIP.

Well... after a lot of time spent doing other things, I finally got a
few hours to throw myself at this problem, and have solved it - in as
far as a proof of concept goes in testing. It's yet to be used under
load though; however I can't see any specific problems ahead once I move
it into production.

The solution involves a "classic" LVS-NAT cluster as follows.
Nomenclature after DIP/RIP/VIP classification is "e" for external (ie.
public address space), "i" for internal (ie. RFC1918 address space) and
numbers to delimit machines.

Director: External NIC eth0 - DIPe, VIP1e
          Internal NIC eth1 - DIPi

Realserver 1: Internal NIC eth1 - RIP1

Realserver 2: Internal NIC eth1 - RIP2

In normal (or "classic" as referred to above) LVS-NAT, the director has
a virtual server configured on VIP1e to NAT requests into RIP1 and RIP2.

Under these circumstances, as discussed in great length in several
threads in Jan/Feb (and many times before), a request from a realserver
to a VIP will not work, because:

src         dst
RIP1 SYN -> VIP1e
RIP1 SYN -> RIP2  (or RIP1, doesn't matter)
RIP2 ACK -> RIP1

at this point the connection never completes because the ACK comes from
an unexpected source (RIP2 rather than VIP1e), so RIP1 drops the packet
and continues sending SYN packets until the application times out.

We need a way to "catch" this part of the connection and make sure that
the packets don't get dropped.

As it turns out, the hypothesis I put forward a month ago works well
(rather to my surprise!), and involves both netfilter (iptables) to
mangle the "client" packets with an fwmark, and the use of LVS-DR to
process them.

What I now have (simplified somewhat, this assumes a single service is
being load balanced in a very small cluster):

Director: External NIC eth0 - DIPe, VIP1e
          Internal NIC eth1 - DIPi

Realserver 1: Internal NIC eth1 - RIP1
              Loopback adapter lo:0 - VIP1e

Realserver 2: Internal NIC eth1 - RIP2
              Loopback adapter lo:0 - VIP1e

The on the director:

/sbin/iptables -t mangle -I PREROUTING -p tcp -i eth1 \
   -s $RIP_NETWORK_PREFIX -d $VIP1e --dport $PORT \
   -j MARK --set-mark $MARKVALUE

and we need a corresponding entry in the LVS tables for this. I'm using
keepalived to manage it; yours may be different, but in a nutshell you
need a virtual server on $MARKVALUE rather than an IP, using LVS-DR,
pointing back to RIP1 and RIP2. Instead of me spamming configs, here's
the ipvsadm -Ln output:

Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port   Forward Weight ActiveConn InActConn

FWM  92 wlc
  -> $RIP1:$PORT           Route  100    0          0
  -> $RIP2:$PORT           Route  100    0          0

(empty connection table right now)

...and believe it or not, that's it. Obviously the more VIPs you have,
the more complex it gets but it's all about repeating the appropriate
config with different RIP/VIP/mark values.

For ease of use I make the hexadecimal mark value match the last octet
of the IP address on the VIP; it makes for easier reading when tracking
stats and so on.

I've not addressed any problems with random ARP problems yet because
they haven't yet occurred in testing; and one major bonus point is that
if a connection is attempted from (ooh, let's say, without giving too
much away) a server-side include on a virtual host on a realserver to
another virtualhost on the same VIP, then it'll get handled locally as
long as Apache (in my case) is configured appropriately.

I hope that's enough information for everyone to work with. It works for
me, at any rate!

Have a good weekend, all

Graeme

<Prev in Thread]	Current Thread	[Next in Thread>
Solved... was Re: LVS-NAT: realserver as client (new thread, same subject!), Graeme Fowler <=

Previous by Date:	Re: Large HTTP GET/POST revisited (and solved), Casey Zacek
Next by Date:	Busted Cluster, nigel
Previous by Thread:	Large HTTP GET/POST revisited (and solved), Casey Zacek
Next by Thread:	Busted Cluster, nigel
Indexes:	[Date] [Thread] [Top] [All Lists]