hi all.
i've encountered a strange problem when using the dh algorithm.
let me describe my network:
i have 3 real servers, one of them acts as the director.
all of them are running 2.4.20, ipvs 1.0.7 ipvsadm v1.21 .
im using the DR method for lvs.
squid is running on all servers, and is set as a virtual service, via
fwmarks. the load balancing algo is dh .
now, when only unix like hosts use the web cache, im fine.
problems start when iexplorer clients start to use it.
( i wrote about this a while ago, but now i pinpointed the cause.. )
after some time (several seconds ) of the iexplorer using the cache, no
other machine can connect to the service.
this is what i get when trying to telnet to squid's port:
telnet: Unable to connect to remote host: Connection refused .
running tcpdump, i can c the answer comes in as an icmp reply of host
unreachable, and is originated from the director.
however, the iexplorer client can still open new connections!!!
this behavior wears off after a couple of seconds.
when removing the webchache service from lvs ( and thus servicing all
requests for squid on the director ) this problem does not occur.
so, i've setup squid to listen on another port, fwmarked it with a different
number, and added it as another virtual service.
now, when the first service got "stuck", the second one still worked.
from this, i assumed the problem is with lvs, but had no idea what caused
this strange behavior.
i tried to change the load balancing algorithm from dh to rr and viola ;)
all works fine, and it never gets stuck.
this is why i think the problem is with the dh algo.
if i were familiar with the netfilter and lvs code, i'd go over the dh
module, but i guess u guys will be better at it...
anyway, were using lvs for quit some time, and are very happy with its
performance.
we'd like to thank you all for the work u put in.
cheers.
--
========================================================================
nir.
|