Hi Joseph,
On Wednesday 02 February 2005 06:09, Joseph Mack wrote:
> It seems you've found a bug here. Hopefully someone will look into it.
It seems a bug to me too, i did some additional debugging;
the weighted round robin algorithm decides if it should use a server based
on the 'current weight' (cw). But if you edit the weight of the
server /or/ delete a server the current weight is not changed.
An example:
I have two servers, one with a weight of 100 and one server with a initial
weight of 25 (lets say; one is a quad cpu system, the other is a single
cpu system). Unfortunately the quad cpu system fails after the first
request and is taken out of service, so we are stuck with just one server
with a weight of 25.
The next request comes in; the current weight is lowered by 25 and is now
75. But the server has a weight of 25 so no destinations are available and
null is returned resulting in a connection refused for the request. The
next requests lowers the current weight to 50, again connection refused,
the next request will finally get a success.
This is the same if for deleting or editing a server (changing its weight).
Two out of four requests fail, mozilla will give the user an error with
'page contains no data', which is highly annoying. In a real situation it
wont happen very often, but with millions of visitors (and in my case;
changing weights every 10-15 seconds) it will happen a couple of times a
day, and people are going to complain.
With the patch below for ip_vs_wrr.c the current weight is resetted to zero
in case of an update. If a server is deleted or updated the current weight
is zero and will be set to the maximum weight (which /is/ updated at each
update) after the first iteration with the following code:
ip_vs_wrr.c:
line
163: if (mark->cw <= 0) {
164: mark->cw = mark->mw;
The impact on the round robin is that -as soon as a server is
deleted/editted- the process of selecting a new destination starts over,
which isn't very bad because that happens all the time when you have two
servers whose weight isn't the same.
The attached patch will set the current weight to 0 in case of an update.
> I'm one of the people who think you shouldn't dynamically change the
> weights of your realservers unless you've got a real good reason.
The main reason for this is to swap out servers which are getting
overloaded with requests or for instance a server administrator who
compiles a new kernel locally ;-). Or, like we had recently, a server with
some defective hardware who still served requests but with a very nasty
responsetime. And of course to swap out servers who are totally dead.
> I know Jeremy Kerr did an honours project on this topic, so he's probably
> looked into the control and feedback theory on the matter and knows more
> about it than I do.
Any chance of Jeremy reading this and tell us if we can find that project
(or conclusions) online somewhere?
> If you're going to dynamically reweight your machines, then you should
> do it on a timescale that is long compared to the events that they are
> handling, ie if you're handling http hits, then the calculation of the
> new weight should sample the load no more than every few secs. If the
> load comes from https, then every few minutes at the most.
At the moment im reweighting the server every 10 seconds, when in
production that will be slightly higher, somewhere around 30 seconds
between updates.
> No-one (except perhaps Jeremy) has done a study of the benefits of
> dynamic weighting, so my statements are just theory at the moment.
I'm going to try both (dynamic and static) to see what's best in my
situation. At least the server wont refuse my connections now and than
when my weights are changed or when a server dies and is taken out the
pool by a monitor script.
>
> Joe
-kees
|