Apparently this is related to some sort of race condition (possibly a problem
with my ldirectord start script which does an edit on the ipvsadm config after
ldirectord has started) if ldirectord starts to receive traffic on port 67/68
before the following commands are run:
ipvsadm -E -u 10.10.10.10:67 -o -s rr
ipvsadm -E -u 10.10.10.10:68 -o -s rr
Then it will be stuck sending traffic to the fist server in the list.
Brian Carpio
Senior Systems Engineer
Office: +1.303.962.7242
Mobile: +1.720.319.8617
Email: bcarpio@xxxxxxxxxxxx
-----Original Message-----
From: linux-ha-bounces@xxxxxxxxxxxxxxxxxx
[mailto:linux-ha-bounces@xxxxxxxxxxxxxxxxxx] On Behalf Of Brian Carpio
Sent: Thursday, February 24, 2011 3:47 PM
To: 'Simon Horman'
Cc: 'lvs-devel'; 'Julian Anastasov'; 'linux-ha@xxxxxxxxxxxxxxxxxx'
Subject: Re: [Linux-HA] UDP / DHCP / LDIRECTORD
All,
So this patch has been working for us flawlessly for the last 5 months or so.
Our infrastructure is 100% virtualized, the other day our loadbalacner01 had a
memory leak and crashed, since we use ldirectord with heartbeat loadbalacner02
took over, however ever since then it seems like the single packet UDP
scheduling has stopped working. Even if I fail back over the loadbalacner01 VM,
I still see all the DHCP traffic going to only one backend server.
If I run ipvsadm -L -n I can see that ipvsadm thinks both of the backend
servers are up since the weight is set to 1 for each server, if I reboot the
second backend server the one which is not receiving any traffic then run
ipvsadm -L -n I can see its weight go to 0 and in the ldirectord log I can see
that its marked dead.
I have exported one of the loadblancers and one of the backend servers (using
VMware) and imported them into another ESXi server, once I boot up the
loadbalacner it works perfectly... I'm very stumped why this would happen, is
there any additional logging you can think of that I might want to enable to
see where the exact problem is?
Here are my configs:
/etc/ha.d/ldirectord.conf
checktimeout=10
checkinterval=2
autoreload=yes
logfile="/var/log/ldirectord.log"
quiescent=yes
virtual=10.10.10.10:67
real=backend_server01:67 masq
real=backend_server02:67 masq
protocol=udp
checktype=ping
scheduler=rr
virtual=10.10.10.10:68
real=back_endserver01:68 masq
real=backend_server02:68 masq
protocol=udp
checktype=ping
scheduler=rr
I had to rewrite the ldirectord start script and added the following lines in
the start and restart sections:
ipvsadm -E -u 10.10.10.10:67 -o -s rr
ipvsadm -E -u 10.10.10.10:68 -o -s rr
Here is the output of ipvsadm -L -n when both backend servers are up (working
environment):
IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler
Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 10.10.10.10:67 rr ops
-> backend_server01:67 Masq 1 0 16731
-> backend_server02:67 Masq 1 0 17447
UDP 192.168.181.67:68 rr ops
-> backend_server01:68 Masq 1 0 0
-> backend_server02:68 Masq 1 0 0
Here is the output of ipvsadm -L -n when both backend servers are up
(non-working environment):
[root@lb01 log]# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler
Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 10.10.10.10:67 rr ops
-> backend_server01:67 Masq 1 0 1
-> backend_server02:67 Masq 1 0 0
UDP 10.10.10.10:68 rr ops
-> backend_server01:68 Masq 1 0 0
-> backend_server02:68 Masq 1 0 0
The only difference I see is that in my "Working" environment my InActConn
number increases as I send load through it, in my "Non-Working" environment the
InActConn stays at 1 the entire time.. Another difference is that in the
"Working" environment I am using a DHCP load testing tool one of my developers
wrote, whereas in the "NON-Working" environment we are actually getting DHCP
traffic from another network device...
Brian Carpio
Senior Systems Engineer
Office: +1.303.962.7242
Mobile: +1.720.319.8617
Email: bcarpio@xxxxxxxxxxxx
-----Original Message-----
From: Brian Carpio
Sent: Thursday, April 15, 2010 1:57 PM
To: Simon Horman
Cc: linux-ha@xxxxxxxxxxxxxxxxxx; lvs-devel; Julian Anastasov
Subject: RE: [Linux-HA] UDP / DHCP / LDIRECTORD
Simon,
Thanks again for all of your hard work, I have sent over a million UDP DHCP
packets at the new kernel/ipvsadm with the patches applied and currently the
only issue (which you know about already) is that ldirectord doesn't know about
the -o option which causes a slight issue with heartbeat (but I just put in a
cheap fix in my ldirectord start script to edit the services created by
ldirectord)..
So not only have I sent over 1,000,000 packets to this setup but I have also
sent them as fast as 10 packets every 3 milliseconds, I plan to do a long term
week long test but I don't foresee any issues..
Let me know if there is any other testing you would like us to do.. or if you
would like me to send out the kernel-2.6.18-128 with the patch and the
ipvsadm-1.24-10 rpm with the patch..
Thanks again Simon you are the man!!
Brian Carpio
-----Original Message-----
From: Simon Horman [mailto:horms@xxxxxxxxxxxx]
Sent: Monday, April 12, 2010 8:56 PM
To: Brian Carpio
Cc: linux-ha@xxxxxxxxxxxxxxxxxx; lvs-devel; Julian Anastasov
Subject: Re: [Linux-HA] UDP / DHCP / LDIRECTORD
Hi Brian,
here are some patches to test.
I have only lightly tested them to the extent that they compile and appear to
configure a valid service.
You can enable one packet scheduling (OPS) by passing the -o option to ipvsadm
when creating a virtual service.
e.g.
# ipvsadm -A -u 172.17.60.211:80 -o
# ipvsadm -L -n
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
UDP 172.17.60.211:80 wlc ops
There are three patches:
ops-kernel-2.6.18-128.el5.patch: Patch against CentOS-5.3's 2.6.18-128 kernel.
ops-ipvsadm-1.24-10: Patch against CentOS-5.3's ipvsadm 1.24-10.
ops-ipvsadm-1.24: Patch against upstream ipvsadm 1.24
I have not up-ported the code to the 2.6.33 kernel and ipvsadm 1.25 yet.
No virus found in this incoming message.
Checked by AVG - www.avg.com
Version: 9.0.801 / Virus Database: 271.1.1/2808 - Release Date: 04/13/10
00:32:00 _______________________________________________
Linux-HA mailing list
Linux-HA@xxxxxxxxxxxxxxxxxx
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems
|