And now _I_ am not sure if I understand this...
One day one of our brains will flourish and pure wisdom will spread over
the world, taking it over.
Sure I would - if either of the following is true: 1) the client's IP address
is new 2) the realserver that would be picked otherwise (e.g. when using true
persistence) is overloaded compared to the others 3) the realserver that
would be picked otherwise is down.
Case 1 is already handled with the normal persistence setting and WLC. Case 3
can be enabled using the sysctl setting in the new LVS.
No, case 3 can only be solved, if you have an intelligent user space
daemon running which takes out the service. Otherwise the load balancer
doesn't really know about the state of a RS.
Case 2 remains then. Suppose you have 9 clients and 3 realservers (all clients
have a different IP).
For me case 2 is still represented with the WLC scheduler but I might
just be dumb.
In case of true persistency each realserver will serve 3 of these clients, ad
infinitum. In case of non-persistency the HTTP requests are spread more or
This is not a realistic case. New clients will come and if RS1 is
overloaded with Client1, Client4 and Client7, then the new client is
_not_ going to RS1 but to RS2.
less randomly over the realservers. This is extremely bad for session state,
since in the end each realserver has to track (and fetch) session data from
all 9 clients, instead of having a balanced set of 3 clients per realserver.
Yes, that's why you have persistence. That was what I tried to explain
with my ASCII sketch.
As long as this status quo is there persistency works ok and performs better
than non-persistent connections.
Ok.
Persistency however is not a 'hard' requirement as it is for e.g. HTTPS, it's
only a 'soft' requirement because it performs better.
Ok. You call it soft requirement or non-true persistency if you set
persistency where you wouldn't really need it but where you gain from it
by not needing to load session IDs for new requests, right?
Thus, if client 1 turns out to be a masquerading gateway for a NAT-ed network
it will send realserver 1's load much higher than realservers 2 and 3 if we
use the 'normal' persistence.
Yes, and it this load is high, response time sinks, resulting in longer
active connections. Result: Once every while the wlc scheduler will not
choose this RS.
Therefore it would be nice if LVS could detect that we're only using 'soft
persistency' and reassign two clients to the other realservers. Now RS1 has
Ohhhhhhhhhhhhhhhhhh. Now I understand. You mean that the load balancer
should realize that the active connections are blasting away the CPU
power on one RS and that he should be fair and take those bloody suckers
away and put some of them to a next RS? How to you plan on maintaining
TCP state information over other nodes?
Did I get it this time? Please, please, please?
the one NAT-ed network, and RS2 and RS3 both serve four other clients,
resulting in both a good balance (almost as good as non-persistency) and ALSO
a strong preference for a single realserver per IP to avoid hitting the
penalty for re-fetching session data.
But in your case established connection would need to reconnect,
wouldn't they?
Another advantage is that changing the weighting to 0 for maintenance will
almost instantly reassign the clients because it's not a technical problem.
Only new clients. Old client will stay on the quiesced server.
Since the reassign is done only once per IP the performance hit is rather
minimal.
Wait a minute, what is being reassigned? Old connections are being
assigned according to the exisiting template to their appropriate RS.
New connections that do not have an entry in the table get assigned to
new RS.
Actually it only confused me a bit :/
It describes a possible example of a few incoming varying srcIPs and
their distribution among 3 RS. But never mind.
That's what we're using now and it works fine during normal operation, but
when pulling a machine down for maintenance it's a PITA, it takes at least
half an hour before all clients are gone from the web sites, and often
longer.
Aehm, so clearly people stay at least for (30 Minutes -
persistency_timeout) on your website. This is tough luck. You could of
course use the described procedure with the sysctrl to take a server out
but then you loose them. And that's your point I think. You would like
to (this time) reassign them so you can take the RS out more quickly, right?
If I'm right then let me tell you that this is not possible :).
No, you misunderstood me. It's working fine over longer periods of time. But
not as good as non-persistency works, and especially for maintenance it's
sheer overkill for us.
Ok. Independantly of the issue that your clients seem to like your
website a lot to stay there almost half an hour, what are your
persistence timeout settings?
Three realservers, and currently about 13 Gb/day traffic, but we've had
another site until 2 months ago that pulled 20 Gb/day alone and we expect it
to return shortly. So I better anticipate for that now I still have the time
:-)
Oh, I thought you were having high volume traffic. If I understand this
correctly, you have between 1.2 and 1.5 MBit/s traffic.
ratz@laphish:~ > echo "20*1024*8/(12*3600)/3" | bc -l
1.26419753086419753086
ratz@laphish:~ >
The current 13 Gb are made almost entirely between 11:00 am and 11:00 pm, with
the peak in the ~4 hours after dinner. The realservers can handle the load,
it's the database that's getting troubles and moving the session state
storage there doesn't sound like a good idea...
Ok. I think I can assume that you have appropriate hardware for the DB.
Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc
|