[lvs-users] 2 LVS problems

To:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
Subject:	[lvs-users] 2 LVS problems
From:	Craig Sanders <cas@xxxxxxxxxxxxx>
Date:	Wed, 26 Jan 2000 11:21:20 +1100

hi, i've just set up my first LVS - was very pleasantly surprised by how
straight-forward and easy it was (once i'd read the docs & howto, that
is)

i'm building a squid proxy farm, using 3 machines (P3-450Mhz, 1GB RAM,
IDE boot/OS/logs disk, AHA294x controller with 4x9GB == 36GB IBM LVD
scsi drives, and Lite-On Communications Inc LNE100TX NICs). all are
running Debian GNU/Linux (latest "unstable" aka "woody"). squid is
version 2.2.5

one of the machines (proxy1) is acting as both a director and a
realserver. it is running kernel 2.2.14, patched for reiserfs (latest
version 3.5.16) and LVS 0.9.7. it has the VIP on eth0:0 so it will
respond to arp requests. LVS is compiled into the kernel, and the
scheduling algorithms are compiled as modules. i used 2^15 for the
masquerading table size.

the other two machines (proxy2 & proxy3) are running 2.2.14 patched only
for reiserfs. they have been given the VIP on dummy0, and i've enabled
the hidden arp feature for dummy0 with the following script fragment:

        ifconfig dummy0 x.x.x.8 netmask 255.255.255.0 broadcast x.x.x.255
        echo 1 > /proc/sys/net/ipv4/conf/all/hidden
        echo 1 > /proc/sys/net/ipv4/conf/dummy0/hidden

IP addresses are as follows:

VIP    - x.x.x.8
proxy1 - x.x.x.215  (director & real server)
proxy2 - x.x.x.216  (realserver)
proxy3 - x.x.x.217  (realserver)

all 3 proxies are configured to use each other as siblings (using their
real IPs, not the VIP)

i haven't tried anything fancy like mon or heartbeat yet, i'll do that
after i've got the basic stuff working to my satisfaction and after
i'm sure i understand everything that's going on. the director has
the following ipvsadm rules (may not be exact - proxy1 is down at the
moment, so i'm rewriting it from memory):

        # HTTP requests
        ipvsadm -A -t x.x.x.8:8080 -s wlc
        ipvsadm -a -t x.x.x.8:8080 -r x.x.x.215
        ipvsadm -a -t x.x.x.8:8080 -r x.x.x.216
        ipvsadm -a -t x.x.x.8:8080 -r x.x.x.217
        # ICP requests
        ipvsadm -A -u x.x.x.8:3130 -s wlc
        ipvsadm -a -u x.x.x.8:3130 -r x.x.x.215
        ipvsadm -a -u x.x.x.8:3130 -r x.x.x.216
        ipvsadm -a -u x.x.x.8:3130 -r x.x.x.217

this seemed to work. i did a simple test of setting $http_proxy to point
to "http://x.x.x.8:8080/"; and then ran wget to mirror our main web site.
all requests were smoothly load balanced over all 3 real-servers. very
impressive.


i then tried configuring the squid on my workstation (which i use so
that i can filter banner ads with my redirector script) to use the VIP
as a parent. it is normally configured to use our main proxy server as a
parent - the 3 new boxes are intended to replace the current proxy (the
old proxy will probably become proxy4 with a lower weighting than the
other 3 unless i find a better use for it)

here's where i encountered the first problem. some requests would
just fail. i'd get a message from the squid on my WS saying "unable
to forward request to a parent". seemed like about one out of every 3
requests failed. clicking reload or shift-reload in netscape didn't help
unless i waited a while.

the squid log on my WS looked like this:

first failure:
948786582.237      4 203.16.167.2 TCP_MISS/503 1232 GET *URL* - TIMEOUT_NONE/- -

multiple reloads and shift-reloads in netscape (this was probably due to
squid briefly caching the TCP_MISS/503 result):

948786585.489     17 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786589.663     11 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786590.509     17 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786591.199      8 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786594.076      4 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786594.784     12 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786595.463     11 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786596.160      8 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786596.831      9 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786597.471      9 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786598.103     11 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786598.734     13 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786599.439      7 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786600.039      7 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786600.631      9 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786601.231      9 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786601.867     15 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
948786602.768     17 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -

finally, the page is fetched:

948786629.368   2228 203.16.167.2 TCP_MISS/200 6055 GET *URL* - 
TIMEOUT_DEFAULT_PARENT/x.x.x.8 text/html

to make the logs fit on one 80-column line, i've replaced the real url
with "*URL*" as it's not important which URL i was fetching...the same
thing happened on several different URLs

btw, 203.16.167.2 is my workstation at home which uses my workstation at
work as a parent squid.


as a completely wild guess (unsubstantiated by any facts), maybe there's
a problem with the director being one of the realservers. or i might
have made a typo on one of the lines for port 3130. or it might be
something to do with squid's CACHE_DIGEST (i didn't think of that until
just now - i'll disable it and try again tomorrow as it is useless with
an LVS setup)


i then reconfigured squid on my WS to use the 3 real IP addresses (.215,
.216, and .217) as parents. everything worked perfectly. this at least
established that the realservers were all functioning correctly when
used without the director.


at that point, i had to go out for dinner and left everything as it was.
when i got home a few hours later i logged in and discovered that the
director had locked up approximately half an hour after i left. it's
a public holiday today ("Australia Day") and i haven't yet been in to
work to restart it and examine the logs. so there's the second problem:
mysterious lockup of a mostly idle director machine.


any comments, suggestions, ideas would be very welcome...


one question: would i be better off using an old pentium box as a
dedicated director rather than using proxy1 as director & realserver?
i could dig one up if needed, could even get two so that i can have a
failover director.

another question, about the LNE100TX NICs: i ordered real DEC 21x4x
cards, but they are almost impossible to get here so my hardware
supplier substituted these PNIC clones - "Lite-On 82c168 PNIC rev 32"
according to the tulip driver. in the HOWTO, someone else had a problem
with tulip cards which he fixed by replacing them with eepro100 cards.
would switching to eepro100 or 3c59x cards be a good idea?


thanks,

craig

--
Craig Sanders
Systems Administrator
VICNET- Victoria's Network              http://www.vicnet.net.au/

----------------------------------------------------------------------
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
To unsubscribe, e-mail: lvs-users-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: lvs-users-help@xxxxxxxxxxxxxxxxxxxxxx

<Prev in Thread]	Current Thread	[Next in Thread>
[lvs-users] 2 LVS problems, Craig Sanders <= Re: [lvs-users] 2 LVS problems, Michael Sparks Re: [lvs-users] 2 LVS problems, Craig Sanders [lvs-users] tulip/eepro100 (was: Re: [lvs-users] 2 LVS problems), Joseph Mack [lvs-users] Scheduling Strategies, Abhay Natu

Previous by Date:	[lvs-users] Max MASQ Table Size !, Jean-Francois Nadeau
Next by Date:	Re: [lvs-users] 2 LVS problems, Michael Sparks
Previous by Thread:	[lvs-users] Max MASQ Table Size !, Jean-Francois Nadeau
Next by Thread:	Re: [lvs-users] 2 LVS problems, Michael Sparks
Indexes:	[Date] [Thread] [Top] [All Lists]