On Wednesday 09 October 2002 00:44, Roberto Nibali wrote:
> One day one of our brains will flourish and pure wisdom will spread over
> the world, taking it over.
;-)
> > Sure I would - if either of the following is true: 1) the client's IP
> > address is new 2) the realserver that would be picked otherwise (e.g.
> > when using true persistence) is overloaded compared to the others 3) the
> > realserver that would be picked otherwise is down.
> >
> > Case 1 is already handled with the normal persistence setting and WLC.
> > Case 3 can be enabled using the sysctl setting in the new LVS.
>
> No, case 3 can only be solved, if you have an intelligent user space
> daemon running which takes out the service. Otherwise the load balancer
> doesn't really know about the state of a RS.
Obviously. But does anyone here run LVS without ldirectord or keepalived at
all?
> > Case 2 remains then. Suppose you have 9 clients and 3 realservers (all
> > clients have a different IP).
>
> For me case 2 is still represented with the WLC scheduler but I might
> just be dumb.
For new client IP addresses, yes, but not for clients that already connected.
In the case of non-persistent connections each HTTP request will be spread
over all available servers and the balance will always be close to optimal.
More importantly, changes in the weighting will take effect almost instantly.
In the case of fully persistent connections it will take quite a while before
the clients are gone, so the weight shift is not even remotely instant.
During normal operation the persistency performs best because it avoids
throwing session state around. When reassigning the weight, however, it's
fine to move all clients to another server as if they were not persistent at
all. All subsequent requests are then treated as 'persistent' again.
This way there is only a very limited amount of reassigning done when there
truly is an imbalance (accidentally, because a single IP turns out to be a
NAT-ed LAN, or implied, because we changed weighting). Non-persistent
reassigns for each and every request, i.e. waaaaaay too often, and fully
persistent never reassigns at all, which works, but isn't really required.
> > Persistency however is not a 'hard' requirement as it is for e.g. HTTPS,
> > it's only a 'soft' requirement because it performs better.
>
> Ok. You call it soft requirement or non-true persistency if you set
> persistency where you wouldn't really need it but where you gain from it
> by not needing to load session IDs for new requests, right?
Yup.
> > Therefore it would be nice if LVS could detect that we're only using
> > 'soft persistency' and reassign two clients to the other realservers. Now
> > RS1 has
>
> Ohhhhhhhhhhhhhhhhhh. Now I understand. You mean that the load balancer
> should realize that the active connections are blasting away the CPU
> power on one RS and that he should be fair and take those bloody suckers
> away and put some of them to a next RS? How to you plan on maintaining
> TCP state information over other nodes?
>
> Did I get it this time? Please, please, please?
LOL :-)
You're _very_ close now... I'm not intending to reassign active TCP socket
connections, only _subsequent_ incoming connections from the same IP.
Remember that we're talking something similar to persistency here, so we're
not working on a connection-per-connection basis, but IP-per-IP basis,
stretching over much more than a single socket connection.
> But in your case established connection would need to reconnect,
> wouldn't they?
No, each subsequent request would get to another RS, incurring the session
retrieval penalty once and only once. The active requests will obviously
still be served by the 'old' RS. After all a single request can't take that
long...
> > Another advantage is that changing the weighting to 0 for maintenance
> > will almost instantly reassign the clients because it's not a technical
> > problem.
>
> Only new clients. Old client will stay on the quiesced server.
Quiescence and persistency is a pain I discovered to my disgrace. One of our
realservers crashed due to a broken motherboard and some clients (amongst
which the website's very company itself...) got connected to the broken
server. I had to turn on quiescence in ldirectord.
Or do you mean my above example, where the quiescence is not caused by
ldirectord but enforced manually before maintenance? Yes, in that case the
old clients will stay on the quiesced RS, which is exactly what I _don't_
want. If the server is down or weighted downwards it's fine to reassign
clients one time. It's not fine to reassign them every time, but there's
nothing wrong with reassigning them only once.
> Wait a minute, what is being reassigned? Old connections are being
> assigned according to the exisiting template to their appropriate RS.
And that's what I wanted to influence ;-)
For best performance and reliability it would be very nice if LVS could change
the template when needed, but not on every HTTP request if there is no need
for it. See what I'm heading at?
> New connections that do not have an entry in the table get assigned to
> new RS.
That's fine.
> Aehm, so clearly people stay at least for (30 Minutes -
> persistency_timeout) on your website. This is tough luck. You could of
> course use the described procedure with the sysctrl to take a server out
> but then you loose them. And that's your point I think. You would like
> to (this time) reassign them so you can take the RS out more quickly,
> right?
Yes. It's only a very small amount of people that stays longer than 15
minutes, but discarding the monitoring software from us and the web site
owner those few clients are the ones actually ordering products and hence the
most valuable...
> If I'm right then let me tell you that this is not possible :).
I know that current LVS cannot do this :-)
I was curious if it was feasible for future versions...
> Ok. Independantly of the issue that your clients seem to like your
> website a lot to stay there almost half an hour, what are your
> persistence timeout settings?
Same as the ASP session timeout, 15 minutes. The few people staying that long
are really requesting multiple pages.
> Oh, I thought you were having high volume traffic. If I understand this
> correctly, you have between 1.2 and 1.5 MBit/s traffic.
>
> ratz@laphish:~ > echo "20*1024*8/(12*3600)/3" | bc -l
> 1.26419753086419753086
Hmm, the calculation seems to be right, although I'm unsure why you are
dividing by 3 in the end. You want the bandwidth usage per realserver? I took
a quick look at the bandwidth usage graphs for our leased line and it looks
like we have close to 0 MBit/s during the night and well over 2.5 MBit/s
during the evenings. The 1.2-1.5 is reasonable for the rest of the day. All
measured for the entire cluster, not for single realservers.
It's not exactly what you'd call 'high volume' though, indeed.
> Ok. I think I can assume that you have appropriate hardware for the DB.
Reasonable is a better word here, but there's not too much room for
improvement without partitioning the database or playing other advanced
tricks. Either way it's a tradeoff of budget vs performance and as long as
the site still performs well enough management wants to delay buying a bigger
database. And frankly, the database can still handle the load, it's only that
there's no growth space available.
--
Martijn
|