[lvs-users] One-to-many dns load balancing and HA/HR questions

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: [lvs-users] One-to-many dns load balancing and HA/HR questions
From: Erik Schorr <erik-lvs@xxxxxxxx>
Date: Mon, 21 Mar 2011 17:50:09 -0700
I plan to deploy a lab environment to start testing LVS as a load 
balancer in front of a group of what could be called nameservers.  These 
nameservers are actually serving telephone call routing, filtering, and 
translation data using UDP dns-style queries, and in our production 
environment normally serve 500-2000 queries per second, each.

The "clients" initiating these queries are ACME session border 
controllers and various other VOIP/SIP processing equipment.  A failure 
of the involved systems to pass a lookup to the servers, process the 
lookup, return a response, and route it back to the client is considered 
critical as it means a call gets dropped or is left with "dead air". 
Best case, the call gets delayed by a few seconds as a request times out 
and (hopefully) gets processed by a device that is able to respond to 
the retransmitted query.

I'm aware of the benefit to lowering the UDP session timeout to 15 
seconds for high-volume DNS load balancing and plan to do this, but I 
was wondering if LVS/IPVS incorporates methods to guarantee delivery of 
a UDP request packet to a server that's able to respond to it, no matter 

In other words, if a DNS request comes into the VIP on the load 
balancer, the load balancer forwards it (either via routing or nat) to a 
"real server", but that real server is unable to correctly receive that 
packet or process the query it contains for any reason, be it a dropped 
packet on the wire, intermittent CPU saturation, a missed interrupt, 
etc, then it would be desirable for the load balancer to detect that a 
response has not been sent back to the client from the realserver and 
basically re-send the same packet (same payload) to another real server 
in the cluster.  The typical time it takes one of these servers to 
respond is usually less than 50ms, but may be as high as 100ms.  If 
200ms has passed after a request and the chosen server hasn't responded 
yet, retransmit a copy of the original request packet to a new server 
without the requesting client realizing there was a timeout.

Is this possible?

When there are 10,000 requests being processed per second, dropping even 
one packet per 100,000 is disastrous for our stats.

Please read the documentation before posting - it's available at: mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to

<Prev in Thread] Current Thread [Next in Thread>