Hello Roberto,
Thanks for your reply, I can see that you know how it is when a tsunami
of requests hits your servers and that I'm not alone with this problem :)
I'll study your solution and try it on a mini lab here. If I have
difficulties I may disturb you again pvt
Thanks again
[]'s
Gustavo Mateus
Roberto Nibali wrote:
Hello Gustavo,
I suspect the clients scheduled for the sorry server never return
back to the cluster, right (only if you use persistency of course)?
That's right.
That's why I first wrote the hprio scheduler (search the list archives).
I'm working on a project for an airline companie.
Some times they post some promotional tickets for a small period of
time (only for passengers that buys on the website can have it) and
the servers go high.
I wrote the server pool implementation for a ticket reseller company
that probably had the same problems as your airline company. Normal
selling activities not needing high end web servers and then from time
to time (in your case promotional tickets, in my case Christina
Aguilera, U2 or Robbie Williams or World Soccer Championship tickets)
peak selling where tickets need to be sold in the first 15 minutes
having tens of thousands of requests per second, plus the illicit
traffic generated by scripters trying to sanction the event. These
peaks, however, do not warrant the acquisition of high-end servers and
on-demand servers cannot be organized/prepared so quickly.
I need to manually limit each server capacity and the remaining
connections need to go to this sorry server.
That's exactly the purpose of my patch, plus you get to see how many
connections (persistent as in session and active/passive connections)
are forwarded to either the normal webservers (so long as they are
within the u_thresh and l_thresh) or the overflow (sorry server) pool.
As soon as one of the RS in the serving pool drops below l_thresh
future connection requests are immediately sent to the service pool
again.
I personally believe that the sorry-server feature is a big missing
piece of framework in IPVS, one that is implemented in all
commercial HW load balancers.
We have tried F5 Big-IP for a while and it worked perfectly, but it
is very expensive for us :(
Yep, about USD 20k-30k to have them in a HA-pair.
So for the 2.4 kernel, I have a patch that has been tested extensively
and is running in production for one year now, having survived some
hype events. I don't know if I find time to sit down for a 2.6
version. Anyway, as has been suggested, you can also try the sorry
server of keepalived, however I'm quite sure that this is not
atomically (since keepalived is user space) and works more like:
while true {
for all RS {
if RS.conns > u_thresh then quiesce RS
if RS.isQuiesced and RS.conns < l_tresh then {
if sorry server active then remove sorry server
set RS.weight to old RS.weight
}
if sum_weight of all RS == 0 then invoke sorry server with weight > 0
}
If this is the case, it will not work for our use cases with high peak
requests, since sessions are not switched to either one service pool
atomically and thus this will result in people being sent to the
overflow pool even though they would have had a legitimate session and
others again get broken pages back, because in midst of the page view
the LB's user space process gets a scheduler call to update its FSM
and so further requests sent for HTTP 1.0 for example will be broken.
The browser hangs on your customers side and your management gets the
angry phone calls of the business users, to whom you had promised B2B
access.
This is roughly how I came around to implementing the server overflow
(spillover server, sorry server) functionality for IPVS.
HTH and best regards,
Roberto Nibali, ratz
|