Re: [lvs-users] 2 LVS problems

To:	Craig Sanders <cas@xxxxxxxxxxxxx>
Subject:	Re: [lvs-users] 2 LVS problems
Cc:	lvs-users@xxxxxxxxxxxxxxxxxxxxxx
From:	Michael Sparks <zathras@xxxxxxxxxxxxxxxxxx>
Date:	Wed, 26 Jan 2000 01:03:43 +0000 (GMT)

On Wed, 26 Jan 2000, Craig Sanders wrote:
[snip]

> all 3 proxies are configured to use each other as siblings (using their
> real IPs, not the VIP)

>       # HTTP requests
>       ipvsadm -A -t x.x.x.8:8080 -s wlc
>       ipvsadm -a -t x.x.x.8:8080 -r x.x.x.215
>       ipvsadm -a -t x.x.x.8:8080 -r x.x.x.216
>       ipvsadm -a -t x.x.x.8:8080 -r x.x.x.217

Forwarding mechanisum? (I presume from the above it's VS-DR)

>       # ICP requests
>       ipvsadm -A -u x.x.x.8:3130 -s wlc
>       ipvsadm -a -u x.x.x.8:3130 -r x.x.x.215
>       ipvsadm -a -u x.x.x.8:3130 -r x.x.x.216
>       ipvsadm -a -u x.x.x.8:3130 -r x.x.x.217

The real servers need the following line in their squid config unless
you're using NAT:

udp_incoming_address x.x.x.8
ie
udp_incoming_address VIP

Or else client caches that talk ICP will get confused, and run really
slowly.

Take a look in your cache.log rather than your access.log - if you see
info along the lines of "unexpected ICP reply from IP x.x.x.215" then
that's possibly the root cause of your problem.

> this seemed to work. i did a simple test of setting $http_proxy to point
> to "http://x.x.x.8:8080/"; and then ran wget to mirror our main web site.
> all requests were smoothly load balanced over all 3 real-servers. very
> impressive.

That's what we thought when we started using LVS :-)

> here's where i encountered the first problem. some requests would
> just fail. i'd get a message from the squid on my WS saying "unable
> to forward request to a parent". seemed like about one out of every 3
> requests failed. clicking reload or shift-reload in netscape didn't help
> unless i waited a while.

My guess is this is just a squid thing rather than anything else - UDP
based services balanced using VS-DR or VS-TUN like ICP need to be bound to
the virtual service address or else everything goes a bit screwy. eg with
bind 8 you need a line like

    listen-on { VIP; }

to get things to load balance properly.

The reason for this is down to the fact that UDP's a connectionless
protocol, so unless the server was bright enough to notice which IP it
recieved the packet on, it'll just choose a default local IP, which
probably won't be the one you want.

> the squid log on my WS looked like this:
> first failure:
> 948786582.237      4 203.16.167.2 TCP_MISS/503 1232 GET *URL* - 
> TIMEOUT_NONE/- -
> 
> multiple reloads and shift-reloads in netscape (this was probably due to
> squid briefly caching the TCP_MISS/503 result):
> 
> 948786585.489     17 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
> 948786602.768     17 203.16.167.2 TCP_MISS/503 1232 GET *URL* - NONE/- -
> 
> finally, the page is fetched:
> 
> 948786629.368   2228 203.16.167.2 TCP_MISS/200 6055 GET *URL* - 
> TIMEOUT_DEFAULT_PARENT/x.x.x.8 text/html

This is the sort of symptom I'd expect for the above problem.

> to make the logs fit on one 80-column line, i've replaced the real url
> with "*URL*" as it's not important which URL i was fetching...the same
> thing happened on several different URLs
> 
> btw, 203.16.167.2 is my workstation at home which uses my workstation at
> work as a parent squid.
> 
> as a completely wild guess (unsubstantiated by any facts), maybe there's
> a problem with the director being one of the realservers. or i might
> have made a typo on one of the lines for port 3130. or it might be
> something to do with squid's CACHE_DIGEST (i didn't think of that until
> just now - i'll disable it and try again tomorrow as it is useless with
> an LVS setup)

See above. Digests are transferred using normal HTTP, so that's not a
problem here. There's some wider issues in squid clustering which we're
working to address as part of our work on the JWCS, which we can discuss
if you like. Essentially it boils down to this: the accuracy of ICP &
digests is Normal accurcy/N for N servers in an Layer 4 balanced
situation.

For an excerpt of a detailled discussion I had with someone on this,
please feel take a look at 
http://epsilon3.wwwcache.ja.net/~zathras/ICP-service.txt
(Won't be there permenently, but I don't want to clutter up the list)

> i then reconfigured squid on my WS to use the 3 real IP addresses (.215,
> .216, and .217) as parents. everything worked perfectly. this at least
> established that the realservers were all functioning correctly when
> used without the director.
[snip]

This'd track with the ICP problem I mentioned.

> at that point, i had to go out for dinner and left everything as it was.
> when i got home a few hours later i logged in and discovered that the
> director had locked up approximately half an hour after i left. it's
> a public holiday today ("Australia Day") and i haven't yet been in to
> work to restart it and examine the logs. so there's the second problem:
> mysterious lockup of a mostly idle director machine.
> 
> 
> any comments, suggestions, ideas would be very welcome...
> 
> one question: would i be better off using an old pentium box as a

Probably - given squid can be a bit of a beast under high load, and fail
just when you don't want it to, and the LVS code seems to be as stable as
a very stable thing indeed, putting the director on a machine that's
unlikely to fail is a Very Good Thing (tm).

That said, having one of your proxy's configured to be able to take over
as a director in an emergency is also a good thing.

Michael.
--
National & Local Web Cache Support        R: G117
Manchester Computing                      T: 0161 275 7195
University of Manchester                  F: 0161 275 6040
Manchester UK M13 9PL                     M: Michael.Sparks@xxxxxxxxxxxxxxx

----------------------------------------------------------------------
LinuxVirtualServer.org mailing list - lvs-users@xxxxxxxxxxxxxxxxxxxxxx
To unsubscribe, e-mail: lvs-users-unsubscribe@xxxxxxxxxxxxxxxxxxxxxx
For additional commands, e-mail: lvs-users-help@xxxxxxxxxxxxxxxxxxxxxx

<Prev in Thread]	Current Thread	[Next in Thread>
[lvs-users] 2 LVS problems, Craig Sanders Re: [lvs-users] 2 LVS problems, Michael Sparks <= Re: [lvs-users] 2 LVS problems, Craig Sanders [lvs-users] tulip/eepro100 (was: Re: [lvs-users] 2 LVS problems), Joseph Mack [lvs-users] Scheduling Strategies, Abhay Natu

Previous by Date:	[lvs-users] 2 LVS problems, Craig Sanders
Next by Date:	Re: [lvs-users] 2 LVS problems, Craig Sanders
Previous by Thread:	[lvs-users] 2 LVS problems, Craig Sanders
Next by Thread:	Re: [lvs-users] 2 LVS problems, Craig Sanders
Indexes:	[Date] [Thread] [Top] [All Lists]