Hello,
even though I have red a lot mails from the list, I haven't found a case
like our.
We have some trouble with our LVS-Cluster. Since a while we
evaluate a 6 Node-Cluster (3 Realserver/1Devel/2Directors) as a Mail
Gateway.
Ok.
The Setup is the following:
The 2 Director (failover) running direct routing.
Setup is done by ldirectord(1.77.2.41) which is
invoked by heartbeat.
Addionally there are two drbd-devices for some
configfiles and data (no ldirectord data/config in there).
On the realserver is some MTA(postfix,amavisd-new etc.) running.
When we starting the setup everything is running fine, the heartbeat
comes up and starts the ldirectord with the following config:
-------------------------------------------------
# Global Directives
checktimeout=10
checkinterval=2
autoreload=yes
quiescent=yes
#Sample configuration for an smtp virtual service.
#Fallback setting overides global
virtual=10.10.x.60:25
real=10.10.x.63:25 gate 100
real=10.10.x.64:25 gate 100
real=10.10.x.65:25 gate 100
service=smtp
scheduler=lc
checkport=25
checktype=connect
protocol=tcp
-------------------------------------------------
so ipvsadm -Ln gives us the following output:
-------------------------------------------------
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.10.x.60:25 lc
-> 10.10.x.63:25 Route 100 0 0
-> 10.10.x.64:25 Route 100 0 0
-> 10.10.x.65:25 Route 100 0 0
-------------------------------------------------
Nice description.
Everything works fine and the Director balances the incoming connections
to the 3 realservers.
But after a while (sorry dont know a specific time) the LB stops
sceduling and the incoming connections will send to the last choosen
realserver.
Is the timeframe within hours or days? Do you have recorded a dmesg -s
1000000 during such an event?
Really strange is that after this happens, you can unload
the modules (ip_vs/ip_vs_lc) and after the ip_vs module got reloaded the
traffic continues send to the last realserver.
Hmm, you might need to enable IPVS debugging during this time. Also
check all your log files regarding your IPVS setup, the ldirectord
output, the heartbeat, the kernel logs, the mta logs, ...
For testing i've writen the output of ipvsadm -Ln( and c) with timestamp
to a file. The last entry was this:
IP Virtual Server version 1.2.0 (size=4096)
Prot LocalAddress:Port Conns InPkts OutPkts InBytes
OutBytes
-> RemoteAddress:Port
TCP 10.10.x.60:25 17617 1872891 0 2541M 0
-> 10.10.x.63:25 5927 614901 0 832431K 0
-> 10.10.x.64:25 5964 663196 0 902503K 0
-> 10.10.x.65:25 5726 594794 0 806841K 0
(No outgoing Pakets, because the mails are dropping in a sink)
The reason why you don't see the outgoing packet counter increasing is
because in LVS-DR (route) the return packets do not pass through the
director.
Restarting the ldirectord has the same effect, everthing continues doing
wrong.
From the looks it's almost perfectly balanced. I'm a bit astonished
concerning your last statement regarding the output above.
I cant get a conclusion out of the symptoms.
Could you tcpdump on the real servers that do not get any requests
anymore along with the surveillance of the various log files?
Every help is appreciated and thanks for your time reading this.
I'm afraid, your output does not underline your statement so far.
Best regards,
Roberto Nibali, ratz
--
echo
'[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq' | dc
|