LVS
lvs-users
Google
 
Web LinuxVirtualServer.org

Busted Cluster

To: lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject: Busted Cluster
From: nigel@xxxxxxxxxxx
Date: Sun, 13 Mar 2005 04:41:07 -0600
Hi,

      I've implemented an LVS/TUN cluster to split web traffic amongst 26+ 
nodes with a replicated MySQL database on each node.

      First, the good news. LVS has been performing well! So too has MySQL 
replication which seems to be handling 26-30 nodes fine  (although I wonder how 
far it can go?).

      Although, periodically we notice that an incoming request to the cluster 
hangs forever. Has anyone else experienced this? Is it LVS related? But this is 
a relatively small problem.

      Now the bad news. This weekend the web service we run came under 
increased load --- about an extra 10,000,000 queries per day ---- and we now 
have a busted cluster. Here is what IPVS looks like:

IP Virtual Server version 1.0.10 (size=65536)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  66.98.x.y:80 rr
  -> 66.98.x.y:80              Tunnel  1      37         337
  -> 67.15.x.y:80               Tunnel  1      14         382
  -> 66.98.x.y:80              Tunnel  1      6          131
  -> 207.44.x.y:80             Tunnel  1      21         325
  -> 66.98.x.y:80              Tunnel  1      57         422
  -> 207.44.x.y:80             Tunnel  1      12         354
  -> 69.57.x.y:80              Tunnel  1      33         355
  -> 67.15.x.y:80                Tunnel  1      71         274
  -> 67.15.x.y:80               Tunnel  1      12         378
  -> 207.44.x.y:80             Tunnel  1      5          345
  -> 66.98.x.y:80               Tunnel  1      59         301
  -> 67.15.x.y:80               Tunnel  1      2          347
  -> 67.15.x.y:80               Tunnel  1      19         375
  -> 69.57.x.y:80              Tunnel  1      10         132
  -> 69.57.x.y:80              Tunnel  1      3          128
  -> 67.15.x.y:80               Tunnel  1      15         361
  -> 69.57.x.y:80              Tunnel  1      8          128
  -> 67.15.x.y:80               Tunnel  1      229        303
  -> 67.15.x.y:80               Tunnel  1      16         372
  -> 67.15.x.y:80               Tunnel  1      125        317
  -> 67.15.x.y:80               Tunnel  1      12         367
  -> 207.44.x.y:80             Tunnel  1      13         333
  -> 207.44.x.y:80             Tunnel  0      144        5
  -> 66.98.x.y:80              Tunnel  1      10         404
  -> 207.44.x.y:80             Tunnel  0      0          0
  -> 207.44.x.y:80             Tunnel  1      132        277

 At this point the service works but is too slow. But in the next 60 seconds 
the - InActConn count grows to over 2000+ per real server - and the whole thing 
locks up.

* What precisely does the InActConn figures show?

Is this symptomatic of simply an overloaded cluster - or could it be a DOS  
problem.

Any insights or similar experiences would be much appreciated?

Kind regards,


Nigel






<Prev in Thread] Current Thread [Next in Thread>