Re: Failover Between 2 Datacenters

To:	"LinuxVirtualServer.org users mailing list." <lvs-users@xxxxxxxxxxxxxxxxxxxxxx>
Subject:	Re: Failover Between 2 Datacenters
From:	nick garratt <nick-lvs@xxxxxxxxxxxxxx>
Date:	Fri, 2 May 2003 09:19:31 +0200

Yup, the DNS cutover mechanism does seem like the best alternative inthis case. 30s is extreme; I think, considering this is failover fora catastrophe occuring once or twice a year at most, 900s should befine.


Consider this however:

DNS is a distributed DB essentially, but all mods are propagated fromthe master. Secondaries are available if a master fails, but not forzone transfers. This master DNS server still represents a singlepoint of failure - should your master fall from the map(datacentre/network outage), how will you originate your zone change? In this instance you will require the cooperation of your registrarto change the IP of your primary DNS server.

I acknowledge the merit of this discussion and the potential need forfailover between datacentres, however it seem that part of theproblem you seek to mitigate are issues that should be covered bySLAs with your providers. In this paradigm we must accept that therewill always be variables beyond our control - script kiddies couldDOS root nameservers, a long haul carrier could go bust, a worm couldgenerate massive outages.

Your provider has an AS assigned to it; any IP range within this theyshould be able to route over their backbone infrastructure nationwide as they please. It would seem that some of these issues couldbe dealt with by hosting in two separate (regional) datacentres withthe same provider with the agreement that under such a failurecondition, that they can reallocate and reroute your IP block to thealternative datacentre. This is entirely feasible for them to doprovided the will is there.



Nick

I know my California example is a bit extreme, but I
wanted to be sure we were talking about a complete
datacenter outage.  If I had a dedicated cabinet with
one CAT5 cable running to it, and some 3rd-shift
network engineer with too much coffee in their blood
knocks my feed from the datacenter switch, I consider
that a "data center outage" for our discussion
purposes.  Or what if a core router starts spewing out
faulty route broadcasts that quickly spread and
corrupt the routes of member routers ... effectively
crippling a network (or internet backbone) for hours.
Human error is very real.  I'm sure hundreds of
*realistic* scenarios could be thought of to justify
off-site redundancy, so lets just move on.

I mentioned VRRP to stress the desire for
near-instantaneous failover ... minimizing the amount
of downtime a client accessing your site may

experience.It should all be transparent as far as they're

concerned.  Obviously, without considerable expense,
this is not achievable.  It looks like the best
solution for this scenario is one using DNS with low
(30 sec?) TTL values.  It won't immediately failover
your services, but it may reduce your downtime from
hours to [several] minutes should some major outage
occur at your primary datacenter.  Nick, I agree with
you that DNS solutions are less than ideal,
considering there are so many factors out of your
control like caching DNS servers that ignore your TTL
values, but it seems to be the only solution for
cost-conscious companies forced to provide three or

more 9's of service to their clients.

For those of you out there who have ever supplied HA
services (SOAP Web Services in our case) to
Fortune-500,100,etc level companies know the
importance of redundant facilities in your service
offering or RFP replies.  You won't make it though
their due-dillegence process without it.

I want to thank all who have contributed to this
thread (on and off list) and acted as sounding boards
for my discussion.  I felt a "Global" or "Datacenter-
level" failover solution hadn't been discussed in
enough detail in any online forum I'd found, and the
LVS group seemed to be the perfect one.

Thanks!
-Ken

--- nick garratt <nick-lvs@xxxxxxxxxxxxxx> wrote:

 Well a State falling off the map is hardly a failure

 > situation that

 makes sense  building in 60s minimum latency cutover
 for. What if the
 United States fell off the map ? What if the map
 ceased to exist ?

 Keeping your DNS TTLs really low can help you
 somewhat in this
 situation, although they certainly cannot be set to
 not cache at all.
 Also I have also encountered DNS servers that do not
 correctly
 observe these settings. You have no control over all
 the intermediate
 name servers that might be caching your DNS records
 and thus is not
 suited to low latency failover.

 VRRP is basically IP failover through election and
 is not relevant to
 this discussion.

 It seems to me the only satisfactory solution is for
 you to apply for
 your own autonomous system (at considerable cost)
 which will allow
 you full control of your BGP data. It will be
 possible with
 cooperation for other AS admins to ensure
 substantial route
 redundancy and rapid cutovers should you lose a
 datacentre/state/continent :)



__________________________________
Do you Yahoo!?
The New Yahoo! Search - Faster. Easier. Bingo.
http://search.yahoo.com
_______________________________________________
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Send requests to lvs-users-request@xxxxxxxxxxxxxxxxxxxxxx
or go to http://www.in-addr.de/mailman/listinfo/lvs-users

<Prev in Thread]	Current Thread	[Next in Thread>
Re: Failover Between 2 Datacenters, (continued) Re: Failover Between 2 Datacenters, ken price Re: Failover Between 2 Datacenters, Graham D. Purcocks Re: Failover Between 2 Datacenters, Horms Re: Failover Between 2 Datacenters, nick garratt Re: Failover Between 2 Datacenters, Björn Metzdorf Re: Failover Between 2 Datacenters, ken price Re: Failover Between 2 Datacenters, Björn Metzdorf Re: Failover Between 2 Datacenters, ken price Re: Failover Between 2 Datacenters, nick garratt Re: Failover Between 2 Datacenters, ken price Re: Failover Between 2 Datacenters, nick garratt <= Re: Failover Between 2 Datacenters, nick garratt Re: Failover Between 2 Datacenters, ken price Re: Failover Between 2 Datacenters, ken price Re: Failover Between 2 Datacenters, Nate Carlson Re: Failover Between 2 Datacenters, Horms

Previous by Date:	Re: Failover Between 2 Datacenters, ken price
Next by Date:	Re: Failover Between 2 Datacenters, nick garratt
Previous by Thread:	Re: Failover Between 2 Datacenters, ken price
Next by Thread:	Re: Failover Between 2 Datacenters, nick garratt
Indexes:	[Date] [Thread] [Top] [All Lists]